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Preface 


This book is about Bohmian mechanics, a non-relativistic quantum theory based 
on a particle ontology. As such it is a consistent theory of quantum phenomena, i.e., 
none of the mysteries and paradoxes which plague the usual descriptions of quantum 
mechanics arise. The most important message of this book is that quantum mechan- 
ics, as defined by its most general mathematical formalism, finds its explication in 
the statistical analysis of Bohmian mechanics following Boltzmann’s ideas. 

The book connects the physics with the abstract mathematical formalism and 
prepares all that is needed to achieve a commonsense understanding of the non- 
relativistic quantum world. Therefore this book may be of interest to both physicists 
and mathematicians. The latter, who usually aim at unerring precision, are often 
put off by the mystical-sounding phrases surrounding the abstract mathematics of 
quantum mechanics. In this book we aim at a precision which will also be acceptable 
to mathematicians. 

Bohmian mechanics, named after its inventor David Bohm,! is about the motion 
of particles. The positions of particles constitute the primitive variables, the primary 
ontology. For a quantum physicist the easiest way to grasp Bohmian mechanics 
and to write down its defining equations is to apply the dictum: Whenever you say 
particle, mean it! 

The key insight for analyzing Bohmian mechanics lies within the foundations of 
statistical physics. The reader will find it worth the trouble to work through Chap. 2 
on classical physics and Chap. 4 on chance, which are aimed at the understanding 
of the statistical analysis of a physical theory as it was developed by the great physi- 
cist Ludwig Boltzmann. Typicality is the ticket to get to the statistical import of 
Bohmian mechanics, which is succinctly captured by p = |y|*. The justification for 


' The equations were in fact already written down by the mathematical physicist Erwin Madelung, 
even before the famous physicist Louis de Broglie suggested the equations at the famous 1927 
Solvay conference. But these are unimportant historical details, which have no significance for the 
understanding of the theory. David Bohm was not aware of these early attempts, and moreover 
he presented the full implications of the theory for the quantum formalism. The theory is also 
called the pilot wave theory or the de Broglie—Bohm theory, but Bohm himself called it the causal 
interpretation of quantum mechanics. 
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this and the analysis of its consequences are two central points in the book. One ma- 
jor consequence is the emergence of the abstract mathematical structure of quantum 
mechanics, observables as operators on Hilbert space, POVMS, and Heisenberg’s 
uncertainty relation. All this and more follows from the theory of particles in mo- 
tion. 

But this is not the only reason for the inclusion of some purely mathematical 
chapters. Schrédinger’s cat story is world famous. A common argument to diminish 
the measurement problem is that the quantum mechanical description of a cat in a 
box is so complicated, the mathematics so extraordinarily involved, that nobody has 
done that yet. But, so the argument continues, it could in principle be done, and if 
all the mathematics is done properly and if one introduces all the observables with 
their domains of definitions in a proper manner, in short, if one does everything in 
a mathematically correct way, then there is no measurement problem. This answer 
also appears in disguise as the claim that decoherence solves the measurement prob- 
lem. But this is false! It is precisely because one can in principle describe a cat in 
a box quantum mechanically that the problem is there and embarrassingly plain to 
see. We have included all the mathematics required to ensure that no student of the 
subject can be tricked into believing that everything in quantum physics would be 
alright if only the mathematics were done properly. 

Bohmian mechanics has been around since 1952. It was promoted by John Bell 
in the second half of the last century. In particular, it was the manifestly nonlocal 
structure of Bohmian mechanics that led Bell to his celebrated inequalities, which 
allow us to check experimentally whether nature is local. Experiments have proved 
that nature is nonlocal, just as Bohmian mechanics predicted. Nevertheless there 
was once a time when physicists said that Bell’s inequalities proved that Bohmian 
mechanics was impossible. In fact, all kinds of criticisms have been raised against 
Bohmian mechanics. Since Bohmian mechanics is so simple and straightforward, 
only one criticism remains: there must be something wrong with Bohmian mechan- 
ics, otherwise it would be taught. And as a consequence, Bohmian mechanics is not 
taught because there must be something wrong with it, otherwise it would be taught. 
We try in this book to show how Bohmian mechanics could be taught. 

Any physicist who is ready to quantize everything under his pen should know 
what quantization means in the simplest and established frame of non-relativistic 
physics, and learn what conclusions should be drawn from that. The one conclusion 
which cannot be drawn is that nothing exists, or more precisely, that what exists 
cannot be named within a mathematically consistent theory! For indeed it can! The 
lesson here is that one should never give up ontology! If someone says: “I do not 
know what it means to exist,” then that is fine. That person can view the theory of 
Bohmian mechanics as a precise and coherent mathematical theory, in which all that 
needs to be said is written in the equations, ready for analysis. 

Our guideline for writing the book was the focus on the genesis of the ideas and 
concepts, to be clear about what it is that we are talking about, and hence to pave the 
way for the hard technical work of learning how it is done. In short, we have tried 
not to leave out the letter ‘h’ (see the Melville quote on p. 4). 
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Chapter 1 
Introduction 


1.1 Ontology: What There Is 


1.1.1 Extracts 


Sometimes quantization is seen as the procedure that puts hats on classical observ- 
ables to turn them into quantum observables. 


Lewis Carroll on Hatters and Cats 


Lewis Carroll (alias Charles Lutwidge Dodgson 1832-1898) was professor of math- 
ematics at Oxford (where Schrédinger wrote his famous cat article): 


The Cat only grinned when it saw Alice. It looked good-natured, she thought: still it had 
VERY long claws and a great many teeth, so she felt that it ought to be treated with respect. 
“Cheshire Puss,” she began, rather timidly, as she did not at all know whether it would like 
the name: however, it only grinned a little wider. “Come, it’s pleased so far,” thought Alice, 
and she went on. “Would you tell me, please, which way I ought to go from here?” “That 
depends a good deal on where you want to get to,” said the Cat. “I don’t much care where 
—” said Alice. “Then it doesn’t matter which way you go,” said the Cat. “— so long as I get 
SOMEWHERE,” Alice added as an explanation. “Oh, you’re sure to do that,” said the Cat, 
“if you only walk long enough.” Alice felt that this could not be denied, so she tried another 
question. “What sort of people live about here?” “In THAT direction,” the Cat said, waving 
its right paw round, “lives a Hatter: and in THAT direction,” waving the other paw, “lives a 
March Hare. Visit either you like: they’re both mad.” “But I don’t want to go among mad 
people,” Alice remarked. “Oh, you can’t help that,” said the Cat: “‘we’re all mad here. ’m 
mad. You’re mad.” “How do you know I’m mad?” said Alice. “You must be,” said the Cat, 
“or you wouldn’t have come here.” Alice didn’t think that proved it at all; however, she 
went on “And how do you know that you’re mad?” “To begin with,” said the Cat, “a dog’s 
not mad. You grant that?” “I suppose so,” said Alice. “Well, then,” the Cat went on, “you 
see, a dog growls when it’s angry, and wags its tail when it’s pleased. Now I growl when 
I’m pleased, and wag my tail when I’m angry. Therefore ’'m mad.” “I call it purring, not 
growling,” said Alice. “Call it what you like,” said the Cat. “Do you play croquet with the 
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Queen today?” “I should like it very much,” said Alice, “but I haven’t been invited yet.” 
“You'll see me there,” said the Cat, and vanished. 


Alice was not much surprised at this, she was getting so used to queer things happening. 
While she was looking at the place where it had been, it suddenly appeared again. 


Alice’s Adventures in Wonderland (1865) [1]. 


Parmenides on What There Is 


The Greek philosopher Parmenides of Elea wrote as follows in the sixth century 
BC: 


Come now, I will tell thee — and do thou hearken to my saying and carry it away — the 
only two ways of search that can be thought of. The first, namely, that It is, and that it is 
impossible for it not to be, is the way of belief, for truth is its companion. The other, namely, 
that It is not, and that it must needs not be, — that, I tell thee, is a path that none can learn 
of at all. For thou canst not know what is not — that is impossible — nor utter it; for it is the 
same thing that can be thought and that can be. 


It needs must be that what can be spoken and thought is; for it is possible for it to be, and 
it is not possible for what is nothing to be. This is what I bid thee ponder. I hold thee back 
from this first way of inquiry, and from this other also, upon which mortals knowing naught 
wander two-faced; for helplessness guides the wandering thought in their breasts, so that 
they are borne along stupefied like men deaf and blind. Undiscerning crowds, who hold that 
it is and is not the same and not the same, and all things travel in opposite directions! 


For this shall never be proved, that the things that are not are; and do thou restrain thy 
thought from this way of inquiry. 


The Way of Truth [2]. 


It is sometimes said that, in quantum mechanics, the observer calls things into being 
by the act of observation. 


Schrodinger on Quantum Mechanics 


Erwin Schrodinger wrote in 1935: 


One can even set up quite ridiculous cases. A cat is penned up in a steel chamber, along 
with the following device (which must be secured against direct interference by the cat): 
in a Geiger counter there is a tiny bit of radioactive substance, so small, that perhaps in 
the course of the hour one of the atoms decays, but also, with equal probability, perhaps 
none; if it happens, the counter tube discharges and through a relay releases a hammer 
which shatters a small flask of hydrocyanic acid. If one has left this entire system to itself 
for an hour, one would say that the cat still lives if meanwhile no atom has decayed. The 
psi-function of the entire system would express this by having in it the living and dead cat 
(pardon the expression) mixed or smeared out in equal parts. 


It is typical of these cases that an indeterminacy originally restricted to the atomic domain 
becomes transformed into macroscopic indeterminacy, which can then be resolved by direct 
observation. That prevents us from so naively accepting as valid a “blurred model” for 
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representing reality. In itself it would not embody anything unclear or contradictory. There 
is a difference between a shaky or out-of-focus photograph and a snapshot of clouds and 
fog banks. 


Die gegenwiartige Situation in der Quantenmechanik [3]. 
Translated by J.D. Trimmer in [4]. 


The difference between a shaky photograph and a snapshot of clouds and fog banks 
is the difference between Bohmian mechanics and quantum mechanics. In itself a 
“blurred model” representing reality would not embody anything unclear, whereas 
resolving the indeterminacy by direct observation does. As if observation were not 
part of physics. 


Feynman on Quantum Mechanics 


Feynman said the following: 


Does this mean that my observations become real only when I observe an observer observ- 
ing something as it happens? This is a horrible viewpoint. Do you seriously entertain the 
thought that without observer there is no reality? Which observer? Any observer? Is a fly an 
observer? Is a star an observer? Was there no reality before 10° B.C. before life began? Or 
are you the observer? Then there is no reality to the world after you are dead? I know a num- 
ber of otherwise respectable physicists who have bought life insurance. By what philosophy 
will the universe without man be understood? 


Lecture Notes on Gravitation [5]. 


Bell on Quantum Mechanics 


According to John S. Bell: 


It would seem that the theory is exclusively concerned about “results of measurement”, 
and has nothing to say about anything else. What exactly qualifies some physical systems 
to play the role of “measurer”? Was the wavefunction of the world waiting to jump for 
thousands of years until a single-celled living creature appeared? Or did it have to wait a 
little longer, for some better qualified system [...] with a Ph.D.? If the theory is to apply to 
anything but highly idealized laboratory operations, are we not obliged to admit that more 
or less “measurement-like” processes are going on more or less all the time, more or less 
everywhere? Do we not have jumping then all the time? 


Against “measurement” [6]. 


Einstein on Measurements 


But it is in principle quite false to base a theory solely on observable quantities. Since, in 
fact, it is the other way around. It is the theory that decides what we can observe. 


Albert Einstein, cited by Werner Heisenberg! in [7]. 


' Quoted from Die Quantenmechanik und ein Gespriich mit Einstein. Translation by Detlef Diirr. 
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Melville on Omissions 


While you take in hand to school others, and to teach them by what name a whale-fish is 
to be called in our tongue leaving out, through ignorance, the letter H, which almost alone 
maketh the signification of the word, you deliver that which is not true. 


Melville (1851), Moby Dick; or, The Whale, Etymology [8]. 


1.1.2 In Brief: The Problem of Quantum Mechanics 


Schrodinger remarks in his laconic way that there is a difference between a shaky 
or out-of-focus photograph and a snapshot of clouds and fog banks. 

The first problem with quantum mechanics is that it is not about what there is. It 
is said to be about the microscopic world of atoms, but it does not spell out which 
physical quantities in the theory describe the microscopic world. Which variables 
specify what is microscopically there? Quantum mechanics is about the wave func- 
tion. The wave function lives on configuration space, which is the coordinate space 
of all the particles participating in the physical process of interest. That the wave 
function lives on configuration space has been called by Schrodinger entanglement 
of the wave function. The wave function obeys a linear equation — the Schrédinger 
equation. The linearity of the Schrédinger equation prevents the wave function from 
representing reality.* We shall see that in a moment. The second problem of quan- 
tum mechanics is that the first problem provokes many rich answers, which to the 
untrained ear appear to be of philosophical nature, but which leave the problem 
unanswered: What is it that quantum mechanics is about? 

It is said for example that the virtue of quantum mechanics is that it is only about 
what can be measured. Moreover, that what can be measured is defined through 
the measurement. Without measurement there is nothing there. But a measurement 
belongs to the macroscopic world (which undeniably exists), and its macroscopic 
constituents like the measurement apparatus are made out of atoms, which quantum 
mechanics is supposed to describe, so this entails that the apparatus itself is to be de- 
scribed quantum mechanically. This circularity lies at the basis of the measurement 
problem of quantum mechanics, which is often phrased as showing incompleteness 
of quantum mechanics. Some things, namely the objects the theory is about, have 
been left out of the description, or the description — the Schrédinger equation — is 
not right. Mathematically the measurement problem may be presented as follows. 
Suppose a system is described by linear combinations of wave functions @; and @2. 
Suppose there exists a piece of apparatus which, when brought into interaction with 
the system, measures whether the system has wave function @; or @. Measuring 


? Even if the equation were nonlinear, as is the case in reduction models, the wave function living 
on configuration space could not by itself represent reality in physical space. There must still be 
some “beable” (related to the wave function) in the sense of Bell [9] representing physical reality. 
But we ignore this fine point here. 
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means that, next to the 0 pointer position, the apparatus has two pointer positions 1 
and 2 “described” by wave functions %, ‘4, and %, for which 


Schrédinger evolution 


git e gM, i=1,2. (1.1) 


When we say that the pointer positions are “described” by wave functions, we mean 
that in a loose sense. The wave function has a support in configuration space which 
corresponds classically to a set of coordinates of particles which would form a 
pointer. 

The Schrédinger equation is linear, so for the superposition, (1.1) yields 


P=AG+O%, c1,0€C, lel? +lel?=1, 


Schrédinger evolution 
— 


ep = (c191 +292) % CO +0c2Q2% . (1.2) 


The outcome on the right does not concur with experience. It shows rather a ““macro- 
scopic indeterminacy”. In the words of Schrédinger, observation then resolves this 
macroscopic indeterminacy, since one only observes either | (with probability |c; \7) 
or 2 (with probability |c2|”), i.e., observation resolves the blurred description of re- 
ality into one where the wave function is either @;'4 or @2'%s. In Schrédinger’s cat 
thought experiment @,2 are the wave functions of the non-decayed and decayed 
atom and are the wave functions of the live cat and % is the wave function of 
the dead cat. Schrédinger says that this is unacceptable. But why? Is the apparatus 
not supposed to be the observer? What qualifies us better than the apparatus, which 
we designed in such a way that it gives a definite outcome and not a blurred one? 

The question is: what in the theory describes the actual facts? Either those vari- 
ables which describe the actual state of affairs have been left out of the description 
(Bohmian mechanics makes amends for that) or the Schrédinger equation which 
yields the unrealistic result (1.2) from (1.1) is false. (GRW theories, or more gen- 
erally, dynamical reduction models, follow the latter way of describing nature. The 
wave function collapses by virtue of the dynamical law [10].) 

The evolution in (1.2) is an instance of the so-called decoherence. The apparatus 
decoheres the superposition c; 1 + c2@2 of the system wave function. Decoherence 
means that it is in a practical sense impossible to get the two wave packets @‘% and 
@2'F, superposed in c; @;'4 + c2@2% to interfere. It is sometimes said that, taking 
decoherence into account, there would not be any measurement problem. Deco- 
herence is this practical impossibility, which Bell referred to as fapp-impossibility 
(where fapp means for all practical purposes), of the interference of the pointer 
wave functions. It is often dressed up, for better looks, in terms of density matrices. 
In Dirac’s notation, the density matrix is 


Po = |e1|”|1) 1) (‘| (1 + |c2|?| 2) 'P) (P| (| - 
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This can be interpreted as describing a statistical mixture of the two states |@,)|'41) 
and |@2)|"%), and one can say that in a fapp sense the pure state given by the right- 
hand side of (1.2), namely 


P= |c1 PH + cro) (ci Pi + c2%Qr| , 
is close to pg. They are fapp-close because the off-diagonal element 


c1c3|Fi Pi) (Po @o| 


where the asterisk denotes complex conjugation, would only be observable if the 
wave parts could be brought into interference, which is fapp-impossible. The argu- 
ment then concludes that the meaning of the right-hand side of (1.2) is fapp-close 
to the meaning of the statistical mixture. To emphasise the fact that such arguments 
miss the point of the exercise, Schrodinger made his remark that there is a differ- 
ence between a shaky or out-of-focus photograph and a snapshot of clouds and fog 
banks. 

To sum up then, decoherence does not create the facts of our world, but rather 
produces a sequence of fapp-redundancies, which physically increase or stabilize 
decoherence. So the cat decoheres the atom, the observer decoheres the cat that 
decoheres the atom, the environment of the observer decoheres the observer that 
decoheres the cat that decoheres the atom and so on. In short, what needs to be 
described by the physical theory is the behavior of real objects, located in physical 
space, which account for the facts. 


1.1.3 In Brief: Bohmian Mechanics 


Bohmian mechanics is about point particles in motion. The theory was invented 
in 1952 by David Bohm (1917-1992) [11] (there is a bit more on the history in 
Chap. 7). In a Bohmian universe everything is made out of particles. Their motion 
is guided by the wave function. That is why the wave function is there. That is its 
role. The physical theory is formulated with the variables q; € R*, i= 1,2,... ,N, 
the positions of the N particles which make up the system, and the wave func- 
tion y(q),--.,@y) on the configuration space of the system. If the wave function 
consists of two parts which have disjoint supports in configuration space, then the 
system configuration is in one or the other support. In the measurement example, 
the pointer configuration is either in the support supp’ of (pointing out 1) or in 
the support supp % of % (pointing out 2). 

Bohmian mechanics happens to be deterministic. A substantial success of Bohmian 
mechanics is the explanation of quantum randomness or Born’s statistical law, on 
the basis of Boltzmann’s principles of statistical mechanics, i.e., Born’s law is not 
an axiom but a theorem in Bohmian mechanics. Born’s statistical law concerning 
p =|w/? says that, if the wave function is y, the particle configuration is |y|?- 
distributed. Applying this to (1.2) implies that the result i comes with probability 
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|c;|*. Suppose the enlarged system composed of system plus apparatus is described 
by the configuration coordinates q = (x,y), where x € R” is the system configura- 
tion and y € R” the pointer configuration. We compute the probability for the result 
1 by Born’s law. This means we compute the probability that the true (or actual) 
pointer configuration Y lies in the support supp‘. In the following computation, 
we assume that the wave functions are all normalized to unity. Then 


P(pointer on 1) = / lc. 1 +.c2Q2"|7d'"xd"y (1.3a) 
supp ‘Mj 
=leiP [ \oi¥Pamd’y+ leo |p /a”dy 
supp 4, supp‘ 
+2K ie | (or) oothaa"y (1.3b) 
supp 


=leiP flo Parxa"y = ler? (1.30) 


where ¥ in (1.3b) denotes the real part of a complex quantity. The terms involving 
, yield zero because of the disjointness* of the supports of the pointer wave func- 
tions %,%. This result holds fapp-forever, since it is fapp-impossible for the wave 
function parts ‘4 and ¥ to interfere in the future, especially when the results have 
been written down or recorded in any way — the ever growing decoherence. 

Suppose one removes the positions of the particles from the theory, as for ex- 
ample Heisenberg, Bohr, and von Neumann did. Then to be able to conclude from 
the fapp-impossibility of interference that only one of the wave functions remains, 
one needs to add an “observer” who by the act of observation collapses the wave 
function with probability |c;|? to the Y% part, thereby “creating” the result i. Once 
again, we may ask what qualifies an observer as better than a piece of apparatus 
or a cat. Debate of this kind has been going on since the works of Heisenberg and 
Schrédinger in 1926. The debate, apart from producing all kinds of philosophical 
treatises, revolves around the collapse of the wave function. But when does the col- 
lapse happen and who is entitled to collapse the wave function? Does the collapse 
happen at all? It is, to put it mildly, a bit puzzling that such an obvious shortcoming 
of the theory has led to such a confused and unfocussed debate. Indeed, it has shown 
physics in a bad light. In Bohmian mechanics, the collapse does not happen at all, 
although there is a fapp-collapse, which one may introduce when one analyzes the 
theory. In collapse theories (like GRW), the collapse does in fact happen. Bohmian 
mechanics and collapse theories differ in that way. They make different predictions 
for macroscopic interference experiments, which may, however, be very difficult to 
perform, if not fapp-impossible. 

We did not say how the particles are guided by the wave function. One gets 
that by simply taking language seriously. Whenever you say “particle” in quantum 
mechanics, mean it! That is Bohmian mechanics. All problems evaporate on the 


3 The wave functions will in reality overlap, but the overlap is negligible. Therefore the wave 
functions are well approximated by wave functions with disjoint supports. 
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spot. In a nutshell, the quantity |y;|?, with y; a solution of Schrédinger’s equation, 
satisfies a continuity equation, the so-called quantum flux equation. The particles in 
Bohmian mechanics move along the flow lines of the quantum flux. In other words 
the quantum flux equation is the continuity equation for transport of probability 
along the Bohmian trajectories. Still not satisfied? Is that too cheap? Why the wave 
function? Why |y;|?? We shall address all these questions and more in the chapters 
to follow. 

Bohmian mechanics is defined by two equations: one is the Schrddinger equation 
for the guiding field y; and one is the equation for the positions of the particles. The 
latter equation reads 


Vyi(Q) 
Wi(Q) ’ 


where 3 denotes the imaginary part, and is an appropriate dimension factor. The 
quantum formalism in its most general form, including all rules and axioms, fol- 
lows from this by analysis of the theory. In particular, Heisenberg’s uncertainty 
relation for position and momentum, from which it is often concluded that particle 
trajectories are in conflict with quantum mechanics, follows directly from Bohmian 
mechanics. 

Bohmian mechanics is nonlocal in the sense of Bell’s inequalities and therefore, 
according to the experimental tests of Bell’s inequalities, concurs with the basic 
requirements that any correct theory of nature must fulfill. 


Q="uS——— (1.4) 


A Red Herring: The Double Slit Experiment 


This is a quantum mechanical experiment which is often cited as conflicting with 
the idea that there can be particles with trajectories. One sends a particle (i.e., a wave 
packet y) through a double slit. Behind the slit at some distance is a photographic 
plate. When the particle arrives at the plate it leaves a black spot at its place of ar- 
rival. Nothing yet speaks against the idea that the particle moves on a trajectory. But 
now repeat the experiment. The next particle marks a different spot of the photo- 
graphic plate. Repeating this a great many times the spots begin to show a pattern. 
They trace out the points of constructive interference of the wave packet y which, 
when passing the two slits, shows the typical Huygens interference of two spherical 
waves emerging from each slit. Suppose the wave packet reaches the photographic 
plate after a time 7. Then the spots show the |y(7)|? distribution,’ in the sense that 
this is their empirical distribution. Analyzing this using Bohmian mechanics, i.e., 
analyzing Schrédinger’s equation and the guiding equation (1.4), one immediately 
understands why the experiment produces the result it does. It is clear that in each 
run the particle goes either through the upper or through the lower slit. The wave 
function goes through both slits and forms after the slits a wave function with an 


4 Tn fact, it is the quantum flux across the surface of the photographic plate, integrated over time 
(see Chap. 16). 
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interference pattern. Finally the repetition of the experiment produces an ensemble 
which checks Born’s statistical law for that wave function. That is the straightfor- 
ward physical explanation. 

So where is the argument which reveals a conflict with the notion of particle 
trajectories? Here it is: 


Close slit | and open slit 2. (1.5a) 
The particle goes through slit 2. (1.5b) 
. (ise) 


It arrives at x on the plate with probability | w(x) 


where Y is the wave function which passed through slit 2. Next 


close slit 2 and open slit 1. (1.6a) 
The particle goes through slit 1. (1.6b) 
It arrives at x on the plate with probability | y(x)|?, (1.6c) 


where y; is the wave function which passed through slit 1. Now open both slits. 


Both slits are open. (1.7a) 
The particle goes through slit 1 or slit 2. (1.7b) 
It arrives at x with probability | yy (x) + w2(x)|? . (1.7c) 


Now observe that in general 


[wi (x) + wax)? = |wa(x)/? + |yo(x)[? + 2RyF (x) yo(x) A Lyi (x) |? +l yo(x)]?. 


The 4 comes from interference of the wave packets W1, W2 which passed through slit 
1 and slit 2. The argument now proceeds in the following way. Situations (1.5b) and 
(1.6b) are the exclusive alternatives entering (1.7b), so the probabilities (1.5c) and 
(1.6c) must add up. But they do not. So is logic false? Is the particle idea nonsense? 
No, the argument is a red herring, since (1.5a), (1.6a), and (1.7a) are physically 
distinct. 


1.2 Determinism and Realism 


It is often said that the aim of Bohmian mechanics is to restore determinism in 
the quantum world. That is false. Determinism has nothing to do with ontology. 
What is “out there” could just as well be governed by stochastic laws, as is the 
case in GRW or dynamical reduction models with, e.g., flash ontology [12, 13]. 
A realistic quantum theory is a quantum theory which spells out what it is about. 
Bohmian mechanics is a realistic quantum theory. It happens to be deterministic, 
which is fine, but not an ontological necessity. The merit of Bohmian mechanics 
is not determinism, but the refutation of all claims that quantum mechanics cannot 
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be reconciled with a realistic description of reality. In physics, one needs to know 
what is going on. Bohmian mechanics tells us what is going on and it does so in the 
most straightforward way imaginable. It is therefore the fundamental description of 
Galilean physics. 

The following passage taken from a letter from Pauli to Born, concerning Ein- 
stein’s view on determinism, is in many ways reminiscent of the present situation in 
Bohmian mechanics: 


Einstein gave me your manuscript to read; he was not at all annoyed with you, but only 
said that you were a person who will not listen. This agrees with the impression I have 
formed myself insofar as I was unable to recognise Einstein whenever you talked about 
him in either your letter or your manuscript. It seemed to me as if you had erected some 
dummy Einstein for yourself, which you then knocked down with great pomp. In particu- 
lar, Einstein does not consider the concept of “determinism” to be as fundamental as it is 
frequently held to be (as he told me emphatically many times), and he denied energetically 
that he had ever put up a postulate such as (your letter, para. 3): “the sequence of such con- 
ditions must also be objective and real, that is, automatic, machine-like, deterministic.” In 
the same way, he disputes that he uses as a criterion for the admissibility of a theory the 
question: “Is it rigorously deterministic?” Einstein’s point of departure is “realistic” rather 
than “deterministic”. 


Wolfgang Pauli, in [14], p. 221. 
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Chapter 2 
Classical Physics 


What is classical physics? In fact it has become the name for non-quantum physics. 
This begs the question: What is quantum physics in contrast to classical physics? 
One readily finds the statement that in classical physics the world is described by 
classical notions, like particles moving around in space, while in modern physics, 
i.e., quantum mechanics, the classical notions are no longer adequate, so there is 
no longer such a “naive” description of what is going on. In a more sophisticated 
version, quantum mechanics is physics in which the position and momentum of a 
particle are operators. But such a statement as it stands is meaningless. One also 
reads that the difference between quantum physics and classical physics is that the 
former has a smallest quantum of action, viz., Planck’s constant fi, and that classical 
physics applies whenever the action is large compared to f, and in many circum- 
stances, this is a true statement. 

But our own viewpoint is better expressed as follows. Classical physics is the 
description of the world when the interference effects of the Schrodinger wave, 
evolving according to Schrédinger’s equation, can be neglected. This is the case 
for a tremendously wide range of scales from microscopic gases to stellar matter. 
In particular it includes the scale of direct human perception, and this explains why 
classical physics was found before quantum mechanics. Still, the viewpoint just ex- 
pressed should seem puzzling. For how can classical motion of particles emerge 
from a wave equation like the Schroddinger equation? This is something we shall 
explain. It is easy to understand, once one writes down the equations of motion of 
Bohmian mechanics. But first let us discuss the theory which governs the behavior 
of matter across the enormous range of classical physics, namely, Newtonian me- 
chanics. In a letter to Hooke, Newton wrote: “If I have seen further it is by standing 
on the shoulders of giants.” 


D. Diirr, S. Teufel, Bohmian Mechanics, DOI 10.1007/978-3-540-89344-8_2, 11 
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2.1 Newtonian Mechanics 


Newtonian mechanics is about point particles. What is a point particle? It is “stuff” 
or “matter” that occupies a point in space called its position, described mathemati- 
cally by q € R*. The theory describes the motion of point particles in space. Math- 
ematically, an N-particle system is described by the positions of the N particles: 


qi,---,9n , qicR’, 


which change with time, so that one has trajectories qi(t),...,qy(t), where the 
parameter ¢ € R is the time. 

Newtonian mechanics is given by equations — the physical law — which govern 
the trajectories, called the equations of motion. They can be formulated in many 
different (but more or less equivalent) ways, so that the physical law looks different 
for each formulation, but the trajectories remain the same. We shall soon look at an 
example. Which formulation one prefers will be mainly a matter of taste. One may 
find the arguments leading to a particular formulation more satisfactory or convinc- 
ing than others. 

To formulate the law of Newtonian mechanics one introduces positive parame- 
ters, called masses, viz., m1,...,m™y, Which represent “matter”, and the law reads 


midi = Fi(qu,--- 4) - (2.1) 


F; is called the force. It is in general a function of all particle positions. Put an- 
other way, it is a function of the configuration, i.e., the family of all coordinates 
(qi,--.,qQv) € R°%. The set of all such N-tuples is called configuration space. The 
quantity q; = dq;/dt = v; is the velocity of the ith particle, and its derivative qj is 
called the acceleration. 

Newtonian mechanics is romantic in a way. One way of talking about it is to 
say that particles accelerate each other, they interact through forces exerted upon 
each other, i.e., Newtonian mechanics is a theory of interaction. The fundamental 
interaction is gravitation or mass attraction given by 


Fi(qi,---,4v) = YG; mMj7—— _ 30 (2.2) 
j#i Ta qj =a 


with G the gravitational constant. 

All point particles of the Newtonian universe interact according to (2.2). In ef- 
fective descriptions of subsystems (when we actually use Newtonian mechanics in 
everyday life), other forces like harmonic forces of springs can appear on the right- 
hand side of (2.1). Such general forces need not (and in general will not) arise from 
gravitation alone. Electromagnetic forces will also play a role, i.e., one can some- 
times describe electromagnetic interaction between electrically charged particles by 
the Coulomb force using Newtonian mechanics. The Coulomb force is similar to 
(2.2), but may have a different sign, and the masses m; are replaced by the charges 
e; which may be positive or negative. 
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One may wonder why Newtonian mechanics can be successfully applied to sub- 
systems like the solar system, or even smaller systems like systems on earth. That 
is, why can one ignore all the rest of the universe? One can give various reasons. For 
example, distant matter which surrounds the earth in a homogeneous way produces 
a zero net field. The force (2.2) falls off with large distances and the gravitational 
constant is very small. In various general situations, and depending on the practical 
task in hand, such arguments allow a good effective description of the subsystem in 
which one ignores distant matter, or even not so distant matter. 


Remark 2.1. Initial Value Problem 


The equation (2.1) is a differential equation and thus poses an initial value problem, 
i.e., the trajectories q;(t), t € IR, which obey (2.1) are only determined once initial 
data of the form q;(fo), qi(to) are given, where fy is some time, called the initial 
time. This means that the future and past evolution of the trajectories is determined 
by the “present” state q;(to), qi(to). Note that the position alone is not sufficient to 
determine the state of a Newtonian system. 

It is well known that differential equations need not have unique and global solu- 
tions, i.e., solutions which exist for all times for all initial values. What does exist, 
however, is — at least in the case of gravitation — a local unique solution for a great 
many initial conditions, i.e., a solution which exists uniquely for some short period 
of time, if the initial values are reasonable. So (2.1) and (2.2) have no solution if, for 
example, two particles occupy the same position. Further, for the solution to exist, it 
must not happen that two or more particles collide, i.e., that they come together and 
occupy the same position. It is a famous problem in mathematical physics to estab- 
lish what is called the existence of dynamics for a gravitating many-particle system, 
where one hopes to show that solutions fail to exist globally only for exceptional 
initial values. But what does “exceptional” mean? We shall answer this in a short 
while. a 


We wish to comment briefly on the manner of speaking about interacting particles, 
which gives a human touch to Newtonian mechanics. We say that the particles at- 
tract each other. Taking this notion to heart, one might be inclined to associate with 
the notion of particle more than just an object which has a position. But that might 
be misleading, since no matter how one justifies or speaks about Newtonian me- 
chanics, when all is said and done, there remains a physical law about the motion 
of point particles, and that is a mathematical expression about changes of points in 
space with time. We shall explore one such prosaic description next. 


2.2 Hamiltonian Mechanics 


One can formulate the Newtonian law differently. Different formulations are based 
on different fundamental principles, like for example the principle of least action. 
But never mind such principles for the moment. We shall simply observe that it is 
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t=T ab 
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3 4 


3 particles in 1 dimension 


configuration space 


Fig. 2.1 Configuration space for 3 particles in a one-dimensional world 


mathematically much nicer to rewrite everything in terms of configuration space 


variables: 


q= 


3N 
eR, 


that is, we write the differential equation for all particles in a compact form as 


with 


mg=F, 


(2.3) 


Fy 


and the mass matrix m = (5!m,)i,j=1,...N- 

Configuration space cannot be depicted (but see Fig. 2.1 for a very special situa- 
tion), at least not for a system of more than one particle, because it is 6-dimensional 
for 2 particles in physical space. It is thus not so easy to think intuitively about 
things going on in configuration space. But one had better build up some intuition 
for configuration space, because it plays a fundamental role in quantum theory. 

A differential equation is by definition a relation between the flow and the vector 
field. The flow is the mapping along the solution curves, which are integral curves 
along the vector field (the tangents of the solution curves). If a physical law is given 
by a differential equation, the vector field encodes the physical law. Let us see how 


this works. 
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Fig. 2.2 Phase space descrip- 
tion of the mathematically 
idealized harmonically swing- 
ing pendulum. The possible 
trajectories of the mathemat- 
ically idealized pendulum 
swinging in a plane with 
frequency | are concentric 
circles in phase space. The 
sets M und M(t) will be 
discussed later 


The differential equation (2.3) is of second order and does not express the rela- 
tion between the integral curves and the vector field in a transparent way. We need 
to change (2.3) into an equation of first order, so that the vector field becomes trans- 
parent. For this reason we consider the phase space variables 


q1 


Pv 
which were introduced by Boltzmann,! where we consider positions and velocities. 
However, for convenience of notation, the latter are replaced by momenta p; = mjvj. 
One point in I” represents the present state of the entire N-particle system. The phase 
space has twice the dimension of configuration space and can be depicted for one 


particle moving in one dimension, e.g., the pendulum (see Fig. 2.2). 
Clearly, (2.3) becomes 


' The notion of phase space was taken by Ludwig Boltzmann (1844-1906) as synonymous with 
the state space, the phase being the collection of variables which uniquely determine the physical 
state. The physical state is uniquely determined if its future and past evolution in time is uniquely 
determined by the physical law. 
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The state of the N-particle system is completely determined by (4) , because (2.4) 


q(to) 


and the initial values (: (to) ) uniquely determine the phase space trajectory (if the 
0 


initial value problem allows for a solution). 
For (2.2) and many other effective forces, there exists a function V on IR? the 
so called potential energy function, with the property that 


ov 
F =—grad,V = “aa =-VV. 
Using this we may write (2.4) as 
OH 
; >, (4:P) 
T\=.) oP (2.5) 
PT \ ap) 
dq q,P 
where 
1 = 
H(q,p) = 5(p-m'p) +V(a) 
1X p? 
=5 dP +V(a.,...,a). 2.6) 


Now we have the Newtonian law in the form of a transparent differential equation 
(2.5), expressing the relation between the integral curves (on the left-hand side, 
differentiated to yield tangent vectors) and the vector field on the right-hand side 
(which are the tangent vectors expressing the physics). The way we have written 
it, the vector field is actually generated by a function H (2.6) on phase space. This 
is called the Hamilton function, after its inventor William Rowan Hamilton (1805— 
1865), who in fact introduced the symbol H in honor of the physicist Christiaan 
Huygens (1629-1695). We shall see later what the “wave man” Huygens has to do 
with all this. The role of the Hamilton function H(q,p) is to give the vector field 


q,P) = aH |? (2.7) 


and the Hamiltonian dynamics is simply given by 
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(4) =v" (q,p). (2.8) 


The function H allows us to focus on a particular structure of Newtonian mechanics, 
now rewritten in Hamiltonian terms. Almost all of this section depends solely on this 
structure, and we shall see some examples shortly. Equations (2.5) and (2.6) with 
the Hamilton function H(q,p) define a Hamiltonian system. 

The integral curves along this vector field (2.7) represent the possible system 


P(t. (a,P) 
initial values ( a (4, oy ) = & . Note that this requires existence and unique- 


trajectories in phase space, i.e., they are solutions eh (4, a) ) of (2.8) for given 


P(0, (4,P) 
ness of solutions of the differential equations (2.8). One possible evolution of the 


entire system is represented by one curve in phase space (see Fig. 2.3), which is 
called a flow line, and one defines the Hamiltonian flow by the map (®”),<p from 
phase space to phase space, given by the prescription that, for any ¢, a point in phase 
space is mapped to the point to which it moves in time ¢ under the evolution (as long 
as that evolution is defined, see Remark 2.1): 


*((3))- (88) 

" \\p P(t, (4,P) 

We shall say more about the flow map later on. The flow can be thought of pictorially 
as the flow of a material fluid in I, with the system trajectories as flow lines. 

Hamiltonian mechanics is another way of talking about Newtonian mechanics. 
It is a prosaic way of talking about the motion of particles. The only romance left 
is the secret of how to write down the physically relevant H. Once that is done, 
the romance is over and what lies before one are the laws of mechanics written in 
mathematical language. So that is all that remains. The advantage of the Hamilto- 
nian form is that it directly expresses the law as a differential equation (2.8). And 
it has the further advantage that it allows one to talk simultaneously about all pos- 
sible trajectories of a system. This will be helpful when we need to define a typical 
trajectory of the system, which we must do later. 

However, this does not by any means imply that we should forget the Newtonian 
approach altogether. To understand which path a system takes, it is good to know 
how the particles in the system interact with each other, and to have some intuition 
about that. Moreover, we should not lose sight of what we are interested in, namely, 
the behavior of the system in physical space. Although we have not elaborated on the 
issue at all, it is also important to understand the physical reasoning which leads to 
the mathematical law (for example, how Newton found the gravitational potential), 
as this may give us confidence in the correctness of the law. Of course we also 
achieve confidence by checking whether the theory correctly describes what we see, 
but since we can usually only see a tiny fraction of what a theory says, confidence 
is mainly grounded on theoretical insight. 

The fundamental properties of the Hamiltonian flow are conservation of energy 
and conservation of volume. These properties depend only on the form of the equa- 
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Fig. 2.3 The Hamilton function generates a vector field on the 6N-dimensional phase space of an 
N-particle system in physical space. The integral curves are the possible trajectories of the entire 
system in phase space. Each point in phase space is the collection of all the positions and velocities 
of all the particles. One must always keep in mind that the trajectories in phase space are not 
trajectories in physical space. They can never cross each other because they are integral curves on 
a vector field, and a unique vector is attached to every point of phase space. Trajectories in phase 
space do not interact with each other! They are not the trajectories of particles 


tions (2.8) with (2.7), i.e., H(q,p) can be a completely general function of (q,p) 
and need not be the function (2.6). When working with this generality, one calls p 
the canonical momentum, which is no longer simply velocity times mass. Now, con- 
servation of energy means that the value of the Hamilton function does not change 
along trajectories. This is easy to see. Let (q(t),p(t)), t € R, be a solution of (2.8). 
Then 


4 (q(d.plt)) <42% 4 p24 = 2H AH OHH _ 
aoe = 154 Pop op oq dq Op 


0. (2.9) 
More generally, the time derivative along the trajectories of any function f (q(t), p(‘)) 
on phase space is 


d of df dHAf dHaf 


The term { f,H} is called the Poisson bracket of f and H. It can also be defined in 
more general terms for any pair of functions f,g, viewing g as the Hamilton function 
and @* the flow generated by g: 


d , Ogo Ogo 
tfee}= Lroos = EST _ Seer (2.11) 
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Note, that {f,H} = 0 means that f is a constant of the motion, i.e., the value of f 
remains unchanged along a trajectory (df/dt = 0), with f = H being the simplest 
example. 

Now we come to the conservation of volume. Recall that the Hamiltonian flow 
(©), <p is best pictured as a fluid flow in I’, with the system trajectories as flow 
lines. These are the integral curves along the Hamiltonian vector field v” (q, p) (2.7). 
These flow lines have neither sources nor sinks, i.e., the vector field is divergence- 
free: 


OH 
; _fa a\| op |_ aH aH _ 
aq 


This important (though rather trivial) mathematical fact is known as Liouville’s the- 
orem for the Hamiltonian flow, after Joseph Liouville (1809-1882). (It has noth- 
ing to do with Liouville’s theorem in complex analysis.) A fluid with a flow that is 
divergence-free is said to be incompressible, a behavior different from air in a pump, 
which gets very much compressed. Consequently, and as we shall show below, the 
“volume” of any subset in phase space which gets transported via the Hamiltonian 
flow remains unchanged. Before we express this in mathematical terms and give the 
proof, we shall consider the issue in more general terms. 


Remark 2.2. On the Time Evolution of Measures. 

The notion of volume deserves some elaboration. Clearly, since phase space is very 
high-dimensional, the notion of volume here is more abstract than the volume of a 
three-dimensional object. In fact, we shall later use a notion of volume which is not 
simply the trivial extension of three-dimensional volume. Volume here refers to a 
measure, the size or weight of sets, where one may in general want to consider a 
biased weight. The most famous measure, and in fact the mother of all measures, 
is the generalization of the volume of a cube to arbitrary subsets, known as the 
Lebesgue measure 1. We shall say more about this later.” If one feels intimidated by 
the name Lebesgue measure, then take |A| = f 4 "x, the usual Riemann integral, as 
the (fapp-correct) Lebesgue measure of A. The measure may in a more general sense 
be thought of as some kind of weight distribution, where the Lebesgue measure 
gives equal (i.e., unbiased) weight to every point. For a continuum of points, this is 
a somewhat demanding notion, but one may nevertheless get a feeling for what is 
meant. For the time being we require that the measure be an additive nonnegative set 
function, i.e., a function which attributes positive or zero values to sets, and which 
is additive on disjoint sets: u(AU B) = u(A) +L (B). The role of the measure will 


2 We need to deal with the curse of the continuum, which is that not all subsets of R” actually 
have a volume, or as we now say, a measure. There are non-measurable sets within the enormous 
multitude of subsets. These non-measurable sets exist mathematically, but are not constructible 
in any practical way out of unions and intersections of simple sets, like cubes or balls. They are 
nothing we need to worry about in practical terms, but they are nevertheless there, and so must be 
dealt with properly. This we shall do in Sect. 4.3.1. 
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eventually be to tell us the size of a set, i-e., which sets are small and which are big. 
Big sets are important, while small ones are not. 

It may be best to think for now of a measure in general as some abstract way of 
attributing “mass” to subsets. One is then led to ask how the measure (or the mass) 
changes with a flow. That question was first asked for the Hamiltonian flow, but it 
can be asked, and in fact has been asked, for flows of a general character. We shall 
do the same now. Let be a measure on the phase space I’, which we take for 
simplicity as being IR”. We consider now a general (not necessarily Hamiltonian) 
flow map on phase space, that is, a one-parameter family of maps (@®,);cp, with 
parameter “time” fr: 


(®(x)) <p, XER": G,o,(x) = B,,(x), p(x) =x. (2.13) 


In general, any flow ®, on R” (or some general phase space) naturally defines the 
time evolution of the measure pl, on R”: 


Hy =Ho@,, (2.14) 
which means 


U,(A) = u(®_,A) , or [(@,A) = (A), (2.15) 


for all (measurable) sets A and all t € R. Behind this definition is the simple logic 
that, if a measure jl is given at time ¢ = 0, then the measure p, of a set A is the 
measure LU of the set from which A originated by virtue of the flow. In other words 
the measure changes only because the set changes. 

The notion of stationary measure is very important. This is a measure that does 
not change under the flow @,, i.e., 


u,(A) = W(A), Vee R, VA. (2.16) 


The stationary measure plays a distinguished role in justifying probabilistic rea- 
soning. Its importance was presumably first discovered by Boltzmann, and later on 
we shall spend some time considering Boltzmann’s general ideas about statistical 
physics, which are valid for a whole range of theories. 

The above-mentioned preservation of volume for Hamiltonian flows as a con- 
sequence of Liouville’s theorem refers to phase space volume and is the assertion 
that 


A(@_,A) =A(A), (2.17) 


with A the Lebesgue measure on phase space. This means that, under the Hamilto- 
nian flow, sets change their shape but not their volume. This may also be referred to 
as Liouville’s theorem, since it is a direct consequence of (2.12), as we shall show 
next. For the pendulum in Fig. 2.2, one sees this immediately, since the slices of the 
pie just rotate. The general situation is depicted in Fig. 2.4. | 
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Fig. 2.4 The preservation of volume of the Hamiltonian flow. The set M in phase space changes 
under the flow but its volume remains the same 


Remark 2.3. Continuity Equation 

From the change in the measure (2.15) for a general flow generated by a vector field, 
one can derive a differential equation which governs the change in the density of the 
measure. That differential equation is called the continuity equation. If the measure 
has a density p(x) (you may think of a mass density), then the change of measure 
with time defines a time-dependent density (x,t), and one has the logical relation? 


= | xs(®(8))p wes, (2.18) 


where 7, is the characteristic function of the set A C I’, also called the indicator 
function of the set A, i.e., the function which is | on A and zero otherwise, and 
@_,A = {x €T | ®,(x) € A}. Furthermore ®, is the solution flow map of some 
vector field v(x) on R” (or some general phase space), i.e., 


© (x) = v(®,(x)) . (2.19) 


We shall now show that the density p (x,t) satisfies the continuity equation: 


£ p(x.) + div[v(x)p(x,t)| =0. (2.20) 


3 Note in passing that p(x,t) can be computed from an obvious change of variables in the last 
integral, namely, p(x,t) = p(®_;(x))|0@_;(x)/Ox| . 
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To see this, replace the indicator function in (2.18) by a smooth function f with 
compact support: 


i f(®,(x))p(x)d"x = i f(x)p(x,t)d"x. (2.21) 
Now differentiate (2.21) with respect to f to get 


d@, (x) 
dt 


[v f(®,(x)) | p(x)d"x = i: f(x) Splx.t)as. (2.22) 


Replacing d®,(x) /dt by the right-hand side of (2.19) and using (2.21) again with f 
replaced by v(x)-[V f(x)] (a wonderful trick), the left-hand side of (2.22) becomes 


- v(x)-[Vf(x)] p(x,t)d"x, (2.23) 


and after partial integration, 


- / f(x)div|v(x)p(x,1)]d"x. (2.24) 


Since this is equal to the right-hand side of (2.22) and since f is arbitrary, we con- 
clude that (2.20) holds. a 


Now we ask whether there is a stationary measure (2.16). In terms of densities, the 
question is: Does there exist a stationary density, that is, a density independent of 
time, satisfying (2.20)? Since the time derivative part of (2.20) vanishes, i.e., 


0 
3,P (x) =9, 


the density must satisfy the partial differential equation 
div[v(x)p(x)] =0. (2.25) 


This is in general a very difficult question. In particular, it is almost impossible to 
find the solution for a general vector field. However, not so in the Hamiltonian case, 
where the answer turns out to be trivial! This is a consequence of (2.12). Setting 


equation (2.12) reads div v(x) = 0 for all x. In this case, (2.20) becomes (after using 
the product rule) 


© p(x.) +v(x)-Vp(x,t) =0. (2.26) 
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Now p (x,t) = constant, which we may choose to be unity, is obviously stationary. 
Putting f = 74, A CT, and taking into account p = 1, (2.21) yields 


i a(®,(x))d'x = / xo_a(x)d"x = wa(x)d"x = 2(A). (2.27) 
In short, (2.27) says 
A(@_,A) =A(A). (2.28) 


We may as well put the future set ®,A into (2.28), instead of A, and use ®_,®, = id 
(identity), whence 


A(A) =A(@,A) . (2.29) 


In conclusion, Liouville’s theorem implies that the Lebesgue measure (= volume) is 
stationary for Hamiltonian flows. 


Remark 2.4. Time-Dependent Vector Fields 

The continuity equation also holds for time-dependent vector fields v(x,t), in which 
case the flow map is a two parameter group ®,,, advancing points from time s to time 
t. All one needs to do is to replace the vector field in (2.20) by the time-dependent 
expression v(x,t), and the proof goes through verbatim. But now the notion of a 
stationary measure seems unsuitable, since the velocity field (representing the phys- 
ical law) changes with time. But remarkably the notion still applies for Hamiltonian 
flows, i.e., even in the case where the Hamiltonian is time dependent (energy is not 
conserved), the volume (Lebesgue measure of a set) remains unchanged under the 
flow. | 


Remark 2.5. On the Initial Value Problem 

The Lebesgue measure in phase space plays a distinguished role for the Hamiltonian 
flow. It is thus natural to weaken the problem of initial values in the sense of the 
measure, so that one is happy if it can be shown that the bad set of initial conditions 
for which no global solutions exist has Lebesgue measure zero. Be warned however, 
that a set which has measure zero may be small in the sense of the measure, but it 
is not necessarily small in the sense of cardinality (number of points in the set). The 
famous Cantor set, a subset of the interval [0, 1], has as many members as the reals, 
but has Lebesgue measure zero. 

We close this section with what is a heretical thought for modern physicists, 
namely, a Newtonian universe. This is not physical (we know it ignores quantum 
mechanics and much more), but we can nevertheless conceive of it, and it is a good 
enough framework in which to ask a question which in an appropriate sense could 
be raised in any other theory: Which initial values give rise to OUR Newtonian 
universe? Put another way: According to which criteria were the initial values of 
OUR universe chosen? We do not ask who chose the initial conditions, but rather: 
Which physical law determines them? One possible answer to this could be that our 
universe is nothing special, that it could be typical in the sense that almost all initial 
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conditions would give rise to a universe very much like ours (where “almost all” 
means that the Lebesgue measure of the set which does not give rise to a universe 
like ours is very small). It turns out that this is not the case, but we shall address this 
issue later. a 


2.3 Hamilton—Jacobi Formulation 


The Hamiltonian structure and phase space are intimately connected with symplec- 
tic geometry. We shall say more about that at the end of the chapter. We wish to 
move on to another question. The Hamiltonian formulation of Newtonian mechan- 
ics is prosaic and brings out the particular structure shared by Newtonian mechanics 
and all Hamiltonian flows: conservation of energy (if H is time independent) and 
phase space volume. But that was not Hamilton’s aim. He had a much deeper vi- 
sion for mechanics. He was looking for an analogy between mechanics and wave 
optics, namely Huygens’ principle and Fermat’s extremal principle of geometric 
optics, according to which light rays take the path requiring the shortest time, and 
moreover follow the normals of wave fronts. Could mechanics be formulated by a 
similar guidance principle, where the mechanical paths are determined by the nor- 
mal vectors of wave fronts? The extremal principle replacing Fermat’s is the least 
action principle 6 f Ldt = 0, where L(q,q) is called the Lagrange function. The me- 
chanical (Newtonian) trajectories between fg, qo and f, q (note that instead of initial 
position and initial velocity, we consider initial position and end position) are char- 
acterized as the extremals of 


[ulae),ae)e . 


We omit the derivation of the Euler-Lagrange equation, which is standard, but we 
recall that for Newtonian mechanics 


1 
L(q,q) = zamg—V(q). (2.30) 


For this Lagrange function, the Euler-Lagrange equations are the usual Newtonian 
equations. The Lagrange function and Hamilton function are Legendre transforms 
of one another.’ (It is remarkable that almost all the great mathematicians of the 
19th century have left some trace in theoretical mechanics.) Starting with H, we get 
L by changing from the variable p to q, so that (q,p) gets replaced by (q,q), using 
the implicitly given function 


.  OH(q,p) 
q= op ? 


4 Here is the definition. Let f(x) be convex and let z be a given slope. Then look for the point x(z) at 
which the tangent to the graph of f has slope z. You find x(z) by minimizing F(x,z) = f(x) —xz in 
x. By convexity of f, this is uniquely determined. The Legendre transform of f is g(z) = F(x(z),z). 
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where the equation is solved by p as a function of q. 
For “normal” Hamilton functions (quadratic in the momentum), that solution is 
immediate, and looking at (2.30), the Legendre transform pops right up: 


L(q,4) = p-4—H(q,p) - (2.31) 


Note in passing that if one starts with the least action principle as being fundamen- 
tal, one can guess the form of the Lagrange function from basic principles such 
as symmetry, homogeneity, and simplicity. But that is not what we wish to discuss 
here. 

We come now to Huygens’ principle and the definition of waves Sq) 4) (q,t) 
which guide mechanical trajectories starting at qo and moving along the vector field 
P(q,t) = VSqoi9(G,t). Hamilton suggested 


t 
Sqo.to (Qt) = i L(y, Y)de , (2.32) 
0 


where Y : qo,t0 ——> q,t is the extremum of the action principle, i.e., the Newtonian 
path. This function is often called the Hamilton—Jacobi function. 

Unfortunately, this definition generally leads to a multivalued function. Take for 
example the harmonic oscillator with period T. There are many extremal trajectories 
for a harmonic oscillator with period T which go from (0,7) to (0,27), so S is not 
uniquely defined. Or again, think of a ball which bounces off a wall. The position 
q in front of the wall can always be reached within a given time by two routes, one 
with and one without reflection from the wall. 

However, the definition is good for short enough times. So never mind this diffi- 
culty, let us pursue the thought. Ignoring the dependence on (qo, fo) and considering> 


as. as 
dS = —d —dt 
ja or 


and in view of (2.32) and (2.31), we can identify 
dS = pdq—H(q,p)dr . 


Then by comparison 


as 
p(q,t) = aq , (2.33) 
whence 
as as 
3 (at) +H (4 a =), (2.34) 


> We may ignore that dependence because we assume uniqueness of the trajectory, and this implies 
that Sqo.ty (4,4) = Sqin (54) + Sqo.to (141). 
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This is known as the Hamilton-Jacobi differential equation. For Newtonian mechan- 
ics, where q; = p;/m;, we then obtain the following picture. On the configuration 
space R” for N particles (n = 3N), we have a function S(q,t) (“unfortunately” mul- 
tivalued), whose role it is to generate a vector field 


v(q,t) = m~!VS(q,t) (2.35) 


on configuration space. Integral curves Q(t) along the vector field are the possible 
trajectories of the N-particle system, i.e., they solve 


dQ 
—_ = t),t). 

Tp = V(Qt),4) 

The function S(q,t) is itself dynamical and solves the nonlinear partial differential 
equation 


Os if. 1-/ 0s 
57 (a!) I oe ( 


2 
) +V(q)=0. (2.36) 
i=l 


mij 


This picture is, as we said, not quite right, because S is in general not well defined 
for mechanics. On the other hand, it is almost quantum mechanics. We shall soon 
understand which little quantum is missing to get the picture right. 


2.4 Fields and Particles: Electromagnetism 


Many hold the view that the particle is not a good concept for physics. They see it as 
a classical Newtonian concept which has been made obsolete by quantum mechan- 
ics. Fields on the other hand are generally well accepted, because relativity teaches 
us that the right physical theory will be a field theory, eventually quantized of course. 
To understand whether fields do work as well as one hopes, we shall have a quick 
look at electromagnetism, where dynamical fields come into play as fundamental 
objects needed to describe interactions between particles which carry a “charge”. 
Electromagnetic fields act on a particle at position q € R? via the Lorentz force: 


4.B(4,1)| , (2.37) 


mg = e |E(q,t) + - 
where E(q,t) und B(q,t) are the electric and magnetic fields, and c is the velocity 
of light. While the fields act on particles as described, in electromagnetism the fields 
are not independent agents living in a kingdom of their own, for they are themselves 
generated by particles. They are generated by particles and they act on particles, 
which is why one may say that they are there to represent the interaction between 
particles. But when the particles are point particles, which is the most natural rela- 
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tivistic possibility, this does not go down well with the fields. We shall explain this 
now. We shall also use the opportunity to introduce relativistic physics. 

Albert Einstein (1879-1955) deduced from Maxwell’s equations of electromag- 
netism that space and most importantly time change in a different way from Galilean 
physics when one changes between frames moving with respect to each other. The 
nature of this change is governed by the fact that the velocity of light does not 
change, when one moves from one frame to another. This led to the understanding, 
soon to be formalised in the four-dimensional description by Hermann Minkowski 
(1864-1909), that space and time are “of the same kind”. That is, a particle needs 
for its specification not only a position in space, but also a location in time, imply- 
ing that the coordinates of a particle in relativistic physics should be space and time 
coordinates. This is a revolution in Newtonian mechanics, where we are of course 
used to particles having different positions, but not different times. So in relativistic 
physics one must get used to the spacetime description of particles, with each par- 
ticle having its own spacetime coordinates. In other words, the configuration space 
of classical mechanics, where we collect all positions of the particles of a system 
at the same time, no longer plays a fundamental role. Instead, Einstein showed that 
the overthrow of absolute time brings physics closer to a true description of nature. 
On this basis, he believed that physical theories must be local, in the sense that no 
physical effect can move faster than light. John Bell showed that this is wrong. We 
shall devote a whole chapter to this later, but now we must move on. 

Minkowski introduced the so-called four-dimensional spacetime with a particular 
scalar product.® In spacetime, particles no longer move in a Newtonian way, but 
according to new dynamics. The particle position is now x“, u = (0,1,2,3), where 
x° is selected as the time coordinate, since it is distinguished by the “signature” of 
the so-called Minkowski length’ 


ds = (ae — dx? = (a7 = ¥ (ax? ; 
i=1 


In Newtonian mechanics, we are used to parameterizing paths by time, which is no 
longer natural. A natural parameter now is length —- Minkowski length — normalized 
by 1/c, i.e., on each trajectory we select a zero mark somewhere and go from there 


© Minkowski suggested using imaginary numbers for the time coordinate in the spacetime coordi- 
nates (xo = ict,x), because then the formal Euclidean scalar product yields the Minkowski metric 
s? = (x9)? + x? ct? + ¥3_, (x‘)?. The advantage is that all congruences, i.e., transformations 
leaving the scalar product invariant (which form the so-called Lorentz group) appear in Euclidean 
guise, and so can be viewed as four-dimensional rotations and reflections. This differs from the 
Galilean case, where the change to relatively moving frames (Galilean boosts) must be dealt with 
separately. In the Minkowski case, the corresponding Lorentz boost is simply a “rotation”, but 
with an imaginary angle. When one considers the changes of xo and, say, x;, such a rotation yields 
Xo’ = x9 cos b — x, sing, x1/ = x9 sin@ +x, cos g, which requires in the Minkowski case @ = iy, an 
imaginary angle. Following the point x; = 0 in the primed frame (moving with relative velocity v), 
one then has x;'/ct’ = v/c = tanh y, which yields the well-known formula for the Lorentz boost. 


7 The sign of ds? is conventional: ds? < 0 implies a spacelike distance, while ds? > 0 implies a 
timelike distance. 
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with the length element of the ith particle 
a 8 1 | (a? 2 dx; of 
ae (7 7 (7 dT; dT; aa 


: ibe: 1 0O 
AF = Buvdp HY = C°, sw=( a eae 


Thus 


where Ez is the 3 x 3 identity matrix, and we use the Einstein summation convention 
according to which we sum automatically over those indices appearing more than 
once. The dot over x; indicates the derivative with respect to Minkowski length 7;, 
also called the proper time, of the ith particle (in the frame where the particle is 
at rest < = cT;). The metric tensor gy which defines here the Minkowski scalar 
product can be used to lower indices: 


. Vv 
Xu -=Suvx ; 


while the inverse of the metric tensor denoted by g"” is used to raise indices. 

For us it is natural to parameterize the trajectory by the coordinate time x” = ct 
of our rest frame where we see the particle move with velocity v. We thus have a 
new function 


0 


Be) = Oe x) Sa a) 


for which we get by the chain rule 


or, taking the Minkowski square, 


2 2 
v dt 


which allows us to switch between proper time and coordinate time. 

The relativistic dynamics of a “free” particle may be defined by an extremal 
principle which determines the physical spacetime path from x to y as the path with 
the shortest Minkowski length. This means that the variation 


y T(y) 
5 a= 3 (#i,)'? ar =0. 
x TX 


If we wish to talk about relativistic mechanics in Newtonian terms, 1.e., if we wish 
to use Newtonian concepts like energy, mass, and force and the like — and we might 
wish to do that because it may then be easier to arrive at Newtonian mechanics 
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as a limiting description of relativistic mechanics — we can multiply the integral 
by dimensional constants to get an action integral, so that the terms in the Euler— 
Lagrange equation read analogously to the Newtonian terms. That is, writing S = 
—mce f ds and using (2.39) in the x° = ct parametrization, the Lagrange function 


becomes 
2 
v 
L(q) =—mc*\/1—-—. 
(q) me* 4] a2 


The Euler-Lagrange equations lead to the canonical momentum 


m 
= mv, 


awe 


from which we recognize m as the rest mass, because p © mv when v < c. 
The canonical momentum can be taken as the vector part of a canonical four- 
momentum 


ph =m" , 
the Minkowski length of which is [see (2.38)] 
SP PSP pp Si ee Se (2.40) 
Hence, parameterizing by x° = ct, we find 


pt = mxt = 


Fae ae (c,v) . 


Observing further that 


mc ly? my, 
xme(l+=—++...}=mc+——+::-, 


J/1l—v2/c? 22 2c 


we are led to set E = p°c. With (2.40), we thus obtain the energy-momentum rela- 
tion: 


E=V/pct+mct = mc . 
Now we have for N particles the spacetime trajectories gj = (qi (ti)) ,u=0,1,2,3, 
i=1,...,N. Let us introduce a force K", which accelerates the particles. By virtue 
of (2.38), we have 
ieHO, 
and this suggests 


K¥y, = 0, 
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that is, the force should be orthogonal to the velocity in the sense of the Minkowski 
metric. The simplest way to achieve this is to put 


KO mde es 
with F¥Y = —FY", an antisymmetric tensor of rank 2, i.e., an antisymmetric 4 x 4 
matrix. One way to generate such a tensor is to use a four-potential A": 
0 7] 
FRY = —_AY— __ A . (2.41) 
OXy OXy 


The Maxwell—Lorentz theory of electromagnetic interaction has the force act on the 
particles through a law that involves not only masses as parameters, but also charges 
ej: 


ej : 
midi =F" (qi) - (2.42) 
In view of (2.37), one names the matrix elements as follows: 


0 FE, Eo £3 

E,; O B3 —Bo 
E,—-B; 0 B, , 
Ez; Bo —B, O 


F#y(x) = (2.43) 


recalling that indices are lowered or raised by action of g#” = gyy, whence F", = 
F HA gay. 
For the three-vector q;(7;), we then obtain 


where dots over symbols still refer to derivatives with respect to 7;. For small veloc- 
ities (compared to the velocity of light), this is close to (2.37). 

But the fields F“ are themselves generated by the particles, and this is supposed 
to give the interaction between the charges. The equation which describes the gen- 
eration of fields is not difficult to guess, by analogy with the gravitational potential 
which is given by the potential equation AV = V- VV = 8(x) for a point mass at the 
origin. (Note that the scalar product construction of the law is a good trick for mak- 
ing the law invariant under Euclidean congruences.) Taking the four-dimensional V 
in the Minkowski sense suggests the four-dimensional potential equation (invariant 
under Minkowskian congruences) 


2 2 
(zn) - (+) |e" =OAH = a (2.44) 
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where j# is the current originating from moving particles carrying charge e;. We 
discuss the current below. 

Note in passing that (2.41) does not determine the vector potential uniquely, be- 
cause a term of the form 0 f/dx" for some well-behaved f can always be added 
to A" without changing the forces. This is called gauge invariance. Equation (2.44) 
determines the potential in what is known as the Lorentz gauge, where 


O_ au 
xh Av =0. 
The current is a distribution, because the charges are concentrated at the positions 
of the particles, which are points. One could think of smeared out charges, but a 
charge with a certain extension (a ball, for example) would not be a relativistic object 
because of Lorentz contraction, i.e., a ball would not remain a ball when moving.® 
Lorentz contraction is an immediate consequence of the loss of simultaneity, since 
the length of a rod is defined by the spatial distance of the end points of the rod 
when taken at the same time. 
The current of a point charge is by itself unproblematic. It has the following 
frame-independent, 1.e., relativistic, form: 


Ha) = Dei f(x ala) al wars, (2.45) 
with 
3 
54(x) = [] 50"), 
p=0 


and we use x for the four vector x = (x°,x), since we have used the boldface notation 
for three-dimensional vectors. Better known is the form in a coordinate frame. With 
x° = ct, we obtain (writing the integral as a line integral along the trajectory in the 


second equality below) 


8 Suppose one did not care about relativistic invariance, and took a small ball in the rest frame of the 
electron (for instance, with radius 10~!° cm, which seems to be an upper bound from experimental 
data). Unfortunately, this yields an effective mass of the electron larger than the observed electron 
mass. The rough argument is that the Coulomb energy of a concentrated charge (infinite for a point 
charge) yields by the energy—mass relation a field mass which, since it moves with the electron 
must be accelerated as well, and is effectively part of the electron mass. Extended Lorentz invariant 
charge distributions entail dynamical problems as well, when strong accelerations occur [1-4]. 
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Ha) = Dee f 3*(x— ala) al war 


= Lee f 54(x—qi)dq/ 
j {qi(t)|GER} 


t 


0 HL 
= Deie f 8(x-ailti)) 5(cr— en) (t;) dt; 


Lu 
= Yed(x— aly) “Heo, 


From (2.45) we easily obtain the continuity equation 


7) 


—— jH(x)=0. 
ant ) 
Using the fact that the trajectories are timelike, we have 


oO . ° 9 dq’ 
i = ‘ 4 : i ; 
Fal (x) Dee [ Fr 0" (x— qi) ae dt; 


ra) 
= ec = x= Qi dq} 
2 Tread oq; ( ) 


= : 4 Ae on fe). |i A fs =: Peas 
= De lim 3 (x qi(T)) ,jim_6 (x qi(T)) 
=0, VreR*. 


The system of differential equations (2.41)—-(2.44) defines the theory of charges in- 
teracting via fields. Now (2.42) is unproblematic if the field F), is given as a well- 
behaved function. The linear partial differential equation (2.44) is likewise unprob- 
lematic if j4 is given, even as a distribution, as in (2.45). Then one has a Cauchy 
problem, i.e., in order to solve (2.44), one needs initial data A (0,x',27,20°) and 
(AH /dt)(0,x!,x?,x*), and there is no obstacle to finding a solution. 

However, we now have to solve (2.42)—(2.44) together, rather than separately, and 
this does not work. The system of differential equations is only formal, i.e., there 
are no functions g4(t) and A“ (x), whose derivatives would satisfy the equations. 
It does not even matter whether we have more than one particle. Let us take one 
particle to see what goes wrong. First solve 


Ama) = ( SH) @= ip ee ix) ate , (2.46) 


c 


with a Green’s function : given by 
16 


' w= O(x—x). (2.47) 
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The Green’s function is not unique. The different possible Green’s functions differ 
by solutions of the homogeneous equation (j = 0), i.e., by functions in the kernel of 
. A symmetric choice is? 


4n ee = 5 ((x—x’)?) = 5 ( (xt —x¥) (xp =%,)) ; 


Why is that symmetric? Using 


é(f@)) => Fano) , (2.48) 


where x, are the single zeros of f we get 


ee re 1 d(x—y) | 1 6(x+y) 


2 2 y ! 


and thus 


l 


4nO,y = 5 ((x—y)*) = 6 ((x° —y°)’ —(x-y)’) 


_ 1 5((x°—y®)—|x—yl) , 15((x°—y®) +|x—yl) 
2 Ix—y| 2 Ix—y| 


? 


which is the sum of retarded and advanced Green’s function, a notion which will 
become clear shortly. Any linear combination of these parts is a possible UO, ne and 
one commonly uses only the retarded part, using the argument that one experiences 
only radiation emitted in the past. We shall say more about that later. One may 
convince oneself by formal manipulations!® that (2.47) holds for this or any other 
linear combination. 


Now let us come to the end of the story. With (2.46) and (2.45), we get 


ante ae : 5 ((x—9(2))2) ge (a)de, (2.49) 


° It is quite natural for the Green’s function to be like this. It is the most natural relativistic function 
one can write down. The points which have Minkowski distance zero from one another form the 
(backward and forward) light cones and they are special in a certain sense. So the function is not 
eccentric in any way. 


‘0 For example, 


(aye 


t sw : x9 — |x : x — |x : F x9 — |x 
=e a(a)a isi) 60 — i) av (2) v5 |) 


using the chain rule on V6. 
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and with (2.48), we find 


where T, and T, are the solutions of 


(x-q(t))? =0 = s ct—ct=F|x—q(0)]. 


We can now understand the terminology used. The retarded/advanced time is given 
by the intersection of the forward/backward light cone based at x with the trajectory 
of the particle (see Fig. 2.5, where x should be thought of as an arbitrary point). 

We have thus derived a field (see for instance [5], if some manipulations seem 
unclear) A which is well-behaved everywhere except at points x lying on the world- 
line q of the charge generating the field. For x = q(T), we have T, = 1 = T, and we 
see that the denominator is zero. But this is now the end of the story, since these are 
the x values which are needed in (2.42). 

This problem is well known by the name of the electron self-interaction. The 
field which the electron generates acts back on the electron, and this back-reaction 
is mathematically ill-defined, since the electron is a point. Hence, the field idea 
for managing interactions between point charges does not work, unless one intro- 
duces formal manipulations like renormalization [6], or changes electromagnetism 
on small scales [7]. 

The Maxwell—Lorentz theory of electromagnetism works well (in the sense of 
describing physical phenomena correctly) when the fields are generated by smeared 
out charges (charge clouds), so one can describe the radiation from an antenna. It 
also works when the fields are given as “external” fields, which act on charges by the 
Lorentz force equation (see [3, 8, 9] for mathematical proofs concerning existence 
and uniqueness). In short, electromagnetism is fine for most non-academic life. One 
may ask why this is so. The reason may be that the Maxwell—Lorentz theory is the 
macroscopic description of the fundamental theory of electromagnetism we shall 
describe next. But that theory does not contain fields on the fundamental level. 


2.5 No fields, Only Particles: Electromagnetism 


What is bad about fields when it comes to describing interactions? The problem is 
that the field naturally acts everywhere, and thus also on the very particle which 
generates the field. But taking the meaning of relativistic interaction between parti- 
cles seriously, why does one need fields at all? Why not get rid of the field and have 
the particles interact directly? In a sense one does this when solving (2.44) for A# 
and putting that into (2.42). Fokker thought that way [10], like many others before 
him, including Gauss [11], Schwarzschild [12], and Tetrode [13]. He wrote down 
the variational principle for a relativistic particle theory, which was later rediscov- 
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Fig. 2.5 In Feynman—Wheeler electromagnetism particles interact along backward and forward 
light cones. Here we set c = | 


ered by Wheeler and Feynman [14, 15], to explain retarded radiation. How can the 
particles interact directly in a relativistic way? 

There is no natural notion of simultaneity whereby a force can act at the same 
time between particles, but we already know that the simplest choice is to take the 
Minkowski distance and to say that particles interact when there is distance zero 
from one to the other. Hence the particle at spacetime point g interacts with all other 
particles at spacetime points which are intersection points of light cones based at g 
with the other trajectories, that is, when 


(gi — 4) (4in — Gin) = (Qi -—4)? =0, 


or put another way, when 6 ((qi— ay) is not zero. 

Note, that there are always two light cones based at one point, one directed to- 
wards the future and one directed towards the past (see Fig. 2.5), although of course 
the physical law is not concerned about such notions as past and future. 

It is rather clear that dynamics which is defined by future times and past times 
can no longer be given by differential equations of the ordinary kind. But some 
differential equations can nevertheless be written down from a variational principle. 
The Fokker—Wheeler—Feynman action S is the simplest relativistic action one could 
think of describing interacting particles: 


s=) =mie f dsi— Yt J [ 8((ai- 49) eat dan i (2.50) 


joi © 


Writing the trajectories q/'(A;) with arbitrary parameters A; and using the notation 
q; = dq; /dA;, we obtain 
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S=) [me fata)" 0n—S2" [ [3( ay) 


j>i 


The most noteworthy feature is that there are no diagonal terms in the double sum. 
This is the major difference with the Maxwell—Lorentz theory, where the diagonal 
terms should normally be present. The contribution of the ith particle to the interac- 
tion reads 


i 1 
=< Pana [ Yei8 (gi 45)*) ainda =— 3 ff) An(ee, 
j#i 
with the “field”!! 
Ain() = De; f 8 ((x—49)") dandy. (2.51) 


ivi 


In the Wheeler-Feynman formulation, fields [like (2.51)] would only be introduced 
as a suitable macroscopic description, good for everyday applications of electro- 
magnetism, like handling capacitors. On the fundamental level there is no radiation 
field and there is no radiation. Therefore, opposite charges may orbit each other 
without “losing energy” due to radiation. Famous solutions of that kind are known 
as Schild solutions [16]. 

Equation (2.49) shows that both the advanced and the retarded Green’s functions 
appear in A,,. But it is the “emission of radiation” which we typically see and which 
is solely described by the retarded Green’s function. Wheeler-Feynman and also 
Maxwell—Lorentz electromagnetism are time reversible, i.e., the theory does not fa- 
vor emission before absorption. The typicality of emission has become known as the 
problem of the electromagnetic arrow of time. The original motivation of Wheeler 
and Feynman was to reduce this arrow of time to the thermodynamic one, which 
had been so successfully explained by Boltzmann, by supposing a special initial 
distribution of the particles in the phase space of the universe. (We shall address this 
further in the chapter on probability.) 

Wheeler and Feynman therefore considered the thermodynamic description of 
the particle system, i.e., they considered a distribution of charges throughout the 
universe which “absorb all radiation”. In terms of the theory, this means that the sum 
of the differences of the retarded and advanced forces over all particles vanish. This 
is called the absorber condition. This macroscopic theory is still time-symmetric. 
But supposing further that at some early time the initial distribution of the particles 
was special (non-equilibrium in some sense), then time-directed radiation phenom- 
ena and in particular the observed radiation damping of an accelerated charge are 
reduced to Boltzmann’s explanation of irreversibility (see Sect. 4.2). 


'l We computed this in (2.49), but it is important to understand that this is simply a mathematical 
expression, which plays no role unless x is a point on the worldlines of the other particles. There is 
no field in this theory. 
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We emphasize that the Wheeler-Feynman electromagnetic theory is a mathe- 
matically consistent relativistic theory with interaction. In fact it is the only such 
mathematically well defined and physically relevant theory existing so far, and it is 
about particles, not fields. The statistical mechanics of the theory leads to the well 
known description by electromagnetic fields and smeared out charges and is thus ex- 
perimentally indistinguishable from Maxwell—Lorentz electromagnetism, whenever 
the latter is well defined, i.e., whenever no point charges are considered to generate 
fields. 

The theory is, however, unfamiliar since its dynamical law is not of the familiar 
form. The intuition one has from solving differential equations with Cauchy data 
fails. So why is this? In fact, the Euler-Lagrange equations of (2.50) are not or- 
dinary differential equations, since advanced and retarded times appear in them.'* 
In contrast, the Maxwell—Lorentz theory is formally of the ordinary type, but with 
the serious drawback that the fields render the equations mathematically undefined 
when they are interpreted as fundamental, i.e., when point charges are considered. 


Remark 2.6. On the Nature of Reality 

Reality is a curious notion. Physics takes the view that something “out there” exists, 
and that the world is “made out of something”. This is not curious at all. But it is 
not so easy to say what it is that the world is made of, since the only access we 
have to the world is through our senses and our thinking, and communication about 
the experience we have. What the world is made of is specified by our physical 
theory about the world, and it is only there where we can see what the world is 
made of. When our physical theory is about point particles and how they move, then 
there are point particles out there — if what the theory says about their motion is 
consistent with our experience, of course. The connection between the entities of 
the theory and our experience is often complicated, often not even spelt out in any 
detailed way. Nevertheless, one has some kind of feel for how it works, and a bit of 
pragmatism in this is alright. 

When we wish to explain a physical phenomenon, we reduce it (in the ideal 
case) to the behavior of the ontological quantities the physical theory is about. In 
Maxwell—Lorentz electromagnetism, fields are ontological. Switch on your radio. 
What better explanation is there than to say that the fields are out there, and they 
get absorbed as radio waves by the radio antenna, and that the radio transforms 
them back into air waves? Music to your ears. But in Wheeler-Feynman electro- 
magnetism, there are no fields and only particles. It explains the music as well. But 
the explanation is different [13, 14]. 

If the Maxwell—Lorentz theory (with point charges) were mathematically consist- 
ent, we could chose between fields and particles as being “real”, or only particles 
as being “real”. Since both would describe the macroscopic world as we see it, our 
choice would then have to be made on the grounds of simplicity and beauty of the 
theories. Perhaps in the future we shall find a simpler and nicer theory than the ones 
we have now, one which is solely about fields. Then only fields will be “real”. 


2 It is an intriguing problem of mathematical physics to establish existence and uniqueness of 
solutions. 
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“Reality” thus changes with the nature of our physical theory, and with it the 
elements which can be measured. In the Wheeler-Feynman theory of electromag- 
netism, the electromagnetic field cannot be measured. This is because it is not there. 
It is not part of the theory. That is trivial. Less trivial may seem the understanding 
that the theory also says what elements are there and how those elements can be 
measured. In Maxwell—Lorentz theory, the electric field is measured, according to 
the theory, by its action on charges. 

Here is another point one may think about from time to time. Although all vari- 
ables needed to specify the physical theory are “real”, there is nevertheless a differ- 
ence. In a particle theory, the particle positions are primitive or primary variables, 
representing what may be called the primitive ontology.'> They must be there: a par- 
ticle theory without particle positions is inconceivable. Particle positions are what 
the theory is about. The role of all other variables is to say how the positions change. 
They are secondary variables, needed to spell out the law. We could also say that 
the particle positions are a priori and the other variables a posteriori. An example of 
the latter might be the electric field. In fact, secondary variables can be replaced by 
other variables or can even be dispensed with, as in Wheeler—Feynman electromag- 
netism. Another example which we did not touch upon at all is general relativity, 
which makes the Newtonian force obsolete. a 


2.6 On the Symplectic Structure of the Phase Space 


With the understanding of tensors and forms, in particular differential forms, not 
only did mathematics move forward, but so did our our insight into physics. It 
was better appreciated what the objects “really” are. We have an example in our 
relativistic description of electromagnetism (2.43). The electric and magnetic field 
strengths are not vector fields, as one learns in school, but rather components of an 
antisymmetric second rank tensor. Mathematical abstraction helps one to get down 
to the basics of things, and that is satisfying. One such mathematical abstraction is 
symplectic geometry. We shall say a few things here mainly to make sure that fur- 
ther mathematical abstractions do not lead to a deeper understanding of the physics 
which we have presented so far. 

Mathematically deeper than conservation of energy and volume is the symplec- 
tic structure of phase space, which goes hand in hand with Hamilton’s formula- 
tion of mechanics [18]. Symplectic geometry needs a space of even dimensions, 
and classical physics provides that. Consider the phase space R7” with coordinates 
(G15+++ s4nsP1s+++sPn)- Given x,y € R”, let g;(x) be the projection of x on the q;th 
coordinate axis. Then 


o? (x,y) = gi(y) pi(x) — 4i(x) pi(y) (2.52) 


'3 The notion and role of primitive ontology has long been ignored, but it has recently been revi- 
talized and emphasized in [17]. 
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is the area of the parallelogram generated by x,y when projected into the (q;, p;) 
plane. Set 


o(x,y) =} w(x,y) = «(Dy = ((—Dy)'s, (2.53) 
i=1 


j= On a En 
7 En On 


where E,, is the n-dimensional unit matrix, 0, is the n-dimensional zero matrix, and 
z' is the transpose of z, i.e., the row vector, an element in the dual space of R2". The 
2-form (, or equivalently the symplectic matrix J, defines the symplectic structure 
of phase space R2” (n = 3N for N particles) and gives (like a scalar product) an 
isomorphism between the vector space and its dual. From courses in analysis, we 
know that the gradient 0 f/0x of a function f(x), x € R¢, is in fact a dual element, 
i.e., the row vector which acts as a linear map on h according to (0 f/0x)h = row 
times column (matrix multiplication by a vector). Now use @” to identify 0 f/dx 
with a vector Vf (using the Euclidean scalar product, this is the normal Vf). Given 
z € R”, the object @?(-,z) is a linear form, i.e., a linear map 


with 


R™ —>R 


x +> @7(x,z) = (—Iz)'x. 


which we wish to be equal to 0 f/dx. We thus search for a z/ such that 


ze =Vof =I(Vf). 


The Hamiltonian flow respects the symplectic structure. In particular, it is area pre- 
serving, which means the following. Let C be a closed curve in R7” and define the 
enclosed area as the sum of the n areas which arise from projections of C onto the 
coordinate planes (q;, p;) [compare with (2.52) and (2.53)]. In the (q;, p;) plane, we 
have a curve C; with area A(C;). The area can be transformed into a line integral by 
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Stokes’ theorem in two dimensions: 


dgid 7) curl a dp; 
ts Ae Vien G ee 
-4£(0)(8) 
Gi 0 dp; 
=f pidai. (2.54) 
Gi 


In general, 


A=areaoiC= f pda= f ridar, 
Using differential forms, 
wo; = dpi Adqi , wo =P dpi Adqi , 
i 
and 
o-=dw', o! = ¥ pidq; . 
i 


Equation (2.54) is nothing other than 


| do! =| o!. 
A(Gi) Cj 


Transporting C with the Hamiltonian flow yields the area A(t), and preservation of 
area means that A(t) = A. By change of variables, the integration over A(t) can be 
expressed by q(t) and p(f) in the integral, so that 


d 


SA) = <4 plaa(e) = f paa+ f peg 


f.paq— f aap + f aap) =0| 


OH OH 
c Oq _ c Op e 


=- fdH=0. 
Cc 


Furthermore, the volume in even-dimensional vector spaces can be thought of as 
arising from a product of areas (generalizing the area in R* = width times length), 
ie., products of two-forms (2.52) yield the volume form: 


@=dpiAdqiA...\dpnAdgn , 
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the Lebesgue measure on phase space (in form language). Then Liouville’s theorem 
arises from preservation of area. 


Transformations of coordinates (q, p) 5 (Q,P) are called canonical or sym- 
plectic if the Jacobi matrix Vy is symplectic, which is a variation on the theme of 
the orthogonal matrix: 


(Vw)IVy=T. (2.55) 


They preserve areas and satisfy the canonical equations 


; —~H 
e =I AQ ; How=H. 
P OQ . 

—H 

oP 


The Poisson bracket (2.11) is invariant under canonical transformations, because 
(2.11) can be written as 


1g) =VIIVE 
and if f(Q,P), g(Q,P), and (Q,P) = y(q,p) are given, since we have 
V(fow)=Vw(Vfoy), 
it follows that 
{fow,sow} = V(foy) -IV(goy) 
= (Vfoy)-(Vy)IVy(Vgo y) 
= (Vfoy) -I(Vgoy) [by (2.55)] 
={f.ghoy. 
Clearly, 
{qi Pi}=Sj, {4i9j}=0,  {Pirpi} =O, 


and variables which satisfy this are said to be canonical. Of particular interest are 
variables (Q1,...,Qn,Pi,...,P,), where P,,...,P, do not change with time, and 
where the Hamilton function takes the form 


i=] 


Then Q; = @j, i.e., O; = wt + O;, and Q; is like the phase of a harmonic oscillator. 
Such (P;,Q;) are called action—angle variables. Systems which allow for such vari- 
ables are said to be integrable, since their behavior in time is in principle completely 
under control, with their motion (in the new coordinates) being that of “uncoupled” 


Mathematical Physics 


42 2 Classical Physics 


harmonic oscillators. The solution can then be found by algebraic manipulation and 
integration. The Hamiltonian motions in R* (H not time dependent, one particle in 
one space dimension) are integrable, since H itself does not change with time, and 
hence one may choose P = H. However, integrability is atypical for Hamiltonian 
systems. 
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Symmetry 


This chapter is more a footnote than a chapter on symmetry in physics. It has inten- 
tionally been kept very short and old-fashioned. We wish to say just as much as is 
needed to understand that Bohmian mechanics is a Galilean theory of nature. And 
we wish to emphasize certain features which may lie buried in more bulky accounts 
of symmetries in physics. Our emphasis is on the the role of ontology for symmetry. 
Ontology is the thing we must hold on to, because that is what the law of physics is 
for. At the end of the day, that is what determines which symmetries there are and 
how they act. 

The invariance of a physical law under a transformation of variables entering 
the law defines a symmetry of that law. Invariance means this. Consider the set of 
solutions of the dynamical law, i.e., a set of histories q(t), ¢ € R (not necessarily 
positions of particles, but think of the variables which the theory is primarily con- 
cerned with, i.e., the primitive variables which relate directly to physical reality). 
Let Y be a group of transformations. We do not specify them, but we suppose we 
know how g € ¥ acts on the trajectory q(t) to give a transformed trajectory which 
we denote by (gq)(t), where 


(gq)(t) = (s(ail?))) 1: 


The law is invariant if for every g € Y and every q(t), the transformed (gq)(t) is 
also in the solution set.! The relevant thing to note is that, with the action of g on 
q, there may be “strange” actions on secondary (or derived) variables in order for 
the law to be invariant. In other words the group action is not only described by the 
action on the primitive variables, but also by the action on the variables which are 
needed to formulate the law. We shall give examples below. 

A symmetry can be a priori, i.e., the physical law is built in such a way that it 
respects that particular symmetry by construction. This is exemplified by spacetime 


' There are two equivalent ways of having the transformations act. Thinking of particle trajectories, 
the transformation may act actively, changing the trajectories, or passively, in which case the tra- 
jectories remain unchanged but the coordinate system changes. These transformations are mutual 
inverses. 
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symmetries, because spacetime is the theater in which the physical law acts (as long 
as spacetime is not subject to a law itself, as in general relativity, which we exclude 
from our considerations here), and must therefore respect the rules of the theater. 
An example is Euclidean space symmetries like rotations. If the theater in which the 
physics takes place is Euclidean space (as in Newtonian mechanics), rotations must 
leave the physical law invariant, i.e., the physical law must respect the rotational 
invariance of Euclidean space. 

The mathematical formulation of the law can also introduce new (secondary) 
symmetries. Examples of this are the so-called gauge symmetries, which arise when 
a secondary variable can be regarded as belonging to an equivalence class, defined 
by gauge transformations which do not affect the histories of the primitive vari- 
ables.” The law is thus only specified up to a choice of gauge. A simple example 
is the potential function V(q) in Newtonian mechanics, which can be changed by 
an additive constant without changing the “physics”. Another example is given by 
A# in (2.44), where all AY" which differ by 0 f/Ox, yield the same dynamics for the 
charges, the dynamics being given by F"Y which is insensitive to such differences. 

Finding all symmetries of a given physical theory is technically important, since 
symmetries go hand in hand with conserved quantities (well known in Lagrangian 
formulations of physical laws in terms of Noether’s theorem), which restrict the 
manifold of solutions. Energy (time-shift invariance), angular momentum (rota- 
tional invariance), and momentum (translational invariance) are examples in New- 
tonian physics. 

One should not be afraid to classify symmetries according to their importance. 
Some are of a purely technical character, while some are fundamental. For example 
canonical transformations (q,p) +> (Q,P), which leave Hamilton’s equations in- 
variant, are invariants of symplectic geometry. These are mathematical symmetries 
since the positions of particles are clearly primary or, if one so wishes, fundamental. 
Likewise the unitary symmetry in quantum mechanics. Here part of the description 
is encoded in the wave function, which in abstract terms is an element of a vector 
space. Since a vector space does not single out any basis, the coordinate represen- 
tation® of the wave function is arbitrary. The arbitrariness of the choice of basis has 
become known as unitary symmetry, which is sometimes conceived of as a fun- 
damental symmetry of nature. That is nonsense, of course, since “position” plays 
a fundamental role in our world and breaks the unitary symmetry in favor of the 
position representation of the wave function. But naturally, these are merely words 
unless one knows what position means in the physical theory, and to know that, the 
theory must contain position as a primitive notion. 


2 In quantum field theory, gauge transformations are viewed as a basic ingredient for finding good 
quantum field theories. However, that is not our concern here. 

3 This has to be taken with a grain of salt. For example, the so-called position or momentum 
representations of the wave function are not coordinate representations with respect to basis choices 
for the vector space. They are simply the function and its Fourier transform, which can however 
be connected to self-adjoint operators via the spectral decompositions. This is part of what will be 
explained in this book. 
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On the other hand the fundamental symmetries are clearly those of spacetime, 
since, as we said before, that is the theater in which the physical law acts. When 
one spells out the ontology specifying the primary objects the physical law is about, 
the physical law should be more or less clear, given the spacetime symmetries and 
metaphysical categories such as simplicity and beauty which the law should obey. 

The Galilean spacetime symmetries are the Euclidean ones of three-dimensional 
space (translation, rotation, and reflection invariance) as well as time translation in- 
variance and time reflection invariance — all parts of Galilean invariance. The latter 
expresses Galilean relativity, which uses the notion of inertial system to assert that 
the law must not change under change of inertial system. By definition (based on 
metaphysical insight), inertial systems can move with uniform velocities relative to 
one another, and the Galilean symmetries thus include one more symmetry trans- 
formation, the Galilean boost, which represents the change from one inertial frame 
to another in relative motion. In relativistic four-dimensional spacetime, things are 
simpler, because all symmetries are congruences of Minkowski spacetime given by 
the Minkowski metric. 

To see symmetry and simplicity at work, let us consider a one-particle world and 
a law for the motion of that particle. Let us consider Newtonian mechanics. For 
example, translations in space. Consider N particles with positions q;,i= 1,... ,N. 
Then 


g(qi) =G; =ai+a, 
and 


mq; = mq; = F;(q) = Fi(qi —a,... ,qy —a) = Fi(qi,.-. yy) . 


where the last equality uses the fact that the force is translation invariant, e.g., 


N 
Fi(qi,-..,4v)= >) F(qi—qy), 
i#j=l 


otherwise translation invariance would not hold. Hence it follows that the translated 
trajectories obey the same law. Let us now consider just one particle, for simplicity 
of notation. 

Let us look quickly at time-shift invariance, or t > t +a, i.e., q/(t) = q(t+a) 
holds if F does not depend on time. Then there are orthogonal transformations 
R of R?, ie., transformations which preserve the Euclidean scalar product so that 
detR = +1, RR' = &. For these, 


q’ =Rq, 


and 


mq = mRq = RE(q) = RF(R'q’) = F(q’) , 
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if F transforms like a vector, that is, if 
F(R'-)=R'F(-). 


For F = b = constant, this does not hold, i.e., Rb 4 b. But V = (0 /dq)' transforms 
like a vector (V is to be thought of as a column vector, while 0 /0q is a row vector): 


a ae pot Be. 
dq Dada rth = v= (SR =RV. 


Now let V be a scalar function, 1.e., 
Vid’) =V(R'q’) =V(q), 
which is R invariant: V’(q’) = V(q’) or V(Rq) = V(q). Then for F(q) = VV(q), 
mi = mRq = RVV(q) = V'V'(q’) = F(q’) . 


The Newtonian gravitational potential is a natural example, since it is solution of 
the simplest Galilean invariant equation, the potential equation AV = V- VV = 0, 
outside the mass distribution, with the boundary condition that V vanishes at infinity. 
The special feature of Newtonian dynamics, i.e., to be of second order (with laws 
of the form q =...) has not played any role up to now. This happens when one 
considers the change to a relatively moving inertial system (a Galilean boost): 


q =q+u. (3.1) 


Clearly q' = q and one sees how simply the invariance of the law under boosts (3.1) 
arises (for translation invariant F). This seems to suggest that a first order theory 
where the particle law has the form q =... should have problems with Galilean 
relativity. One would be inclined to think that one ought to have an equation for q 
in order to make the law invariant. Here is a little elaboration on that idea. 

Let us devise a Galilean theory for one particle. We start with a theory of second 
order: 


translation rotation 
invariance invariance 


g=F(q) = Fe=const. = F=0, 


whence q = 0 is the only possibility, and that is Galilean invariant. Now try a first 
order theory: 


translation rotation 
invariance invariance 


q=v(q,t) = > v=const. = v=0, 


whence q = 0 would be the law, i.e., no motion. But that is not Galilean invariant! 
Thus one may wish to conclude that first order theories (which could be called 
Aristotelian, according to the Aristotelian idea that motion is only guidance toward 
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a final destination) cannot be Galilean invariant. We already know that this cannot 
be right. The Hamilton-Jacobi formulation of Newtonian mechanics, which is a 
first order theory, must be Galilean invariant. We do not want to pursue this further 
here, but the following aspect of symmetry transformations is helpful if one wants 
to understand why Hamilton—Jacobi is Galilean invariant. 


We want to discuss time-reversal symmetry: t + t, q(t) q/(t) = q(—t). 
Clearly, q’(t) is a solution of the Newtonian equations, since (dt)? = (—dr)?. Again 
the form q = ... for the law is responsible for this invariance. But now move to a 


phase space description in terms of (q,p). We have 


and time-reversal invariance follows if t > —t goes along with (q,p) +> (q,—p) 
(clearly p' = —p since velocities must be reversed). 

It is worth saying this more generally. Let the phase point be given by X, and let 
t+» —t come along with X + X*, where the asterisk denotes an involution (X*)* = 
X (like multiplication by —1, or, as another example, complex conjugation). That 
is, the asterisk denotes a representation of time reversal. Invariance holds if X’(t) = 
X*(—t) is a solution of the dynamics whenever X(t) is a solution. The important 
fact to note is that the way this operation acts depends on the role the variables 
play in the physical law. The primitive variables usually remain unchanged, while 
secondary variables, those whose role is “merely” to express the physical law for 
the primitive variables, may change in a “strange” way. A well-known example is 
Maxwell-Lorentz electrodynamics. The state is X = (q,q, E, B) and 


(q,q,E,B)* = (q, —q,E, —B) ’ 


which follows from the Lorentz force equation (2.37). It is clear that q > —q, but 
then B must follow suit to make the equation time-reversal invariant. The lesson 
is that some variables may need to be changed in a strange way for the law to be 
invariant. But as long as those variables are secondary, there is nothing to worry 
about. 

Let us close with a final remark on time-reversal invariance. One should ask why 
we are so keen to have this feature in the fundamental laws when we experience the 
contrary. Indeed, we typically experience thermodynamic changes which are irre- 
versible, i.e., which are not time reversible. The simple answer is that our platonic 
idea (or mathematical idea) of time and space is that they are without preferred 
direction, and that the “directed” experience we have is to be explained from the 
underlying time symmetric law. How can such an explanation be possible? This is 
at the same time both easy and confusing. Certainly the difference in scales is of 
importance. The symmetry of the macroscopic scale can be different from that of 
the microscopic scale, if the “initial conditions” are chosen appropriately. This will 
be further discussed in Sect. 4.2. 
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Physical laws do not usually involve probabilities.! The better known physical the- 
ories are deterministic and defined by differential equations (like the Schrédinger 
equation), so that a system’s evolution is determined by “initial data’. In a determin- 
istic theory (like Newtonian Mechanics or Maxwell—Lorentz electromagnetism), the 
physical world simply evolves like clockwork. Yet we experience randomness — 
chance seems to be the motor of many physical events, like the zigzag trajectories 
of a Brownian particle or the random dots left by particles which arrive at the screen 
in a two slit experiment. Chance determines whether a coin falls heads or tails — we 
cannot predict the actual outcome. How should we explain that in a deterministic 
world? 

Chance, randomness, and probability are words one uses lightly, but their mean- 
ing is cloudy, and to many they seem unfit for the mathematical precision one ex- 
pects of nature’s laws. The way randomness manifests itself is something of an 
oxymoron, for there is /awlike behavior in random events, as attested by the law of 
large numbers, asserting for example that heads and tails typically come up equally 
often in the long run. On the other hand we describe coin tossing by complete lack 
of knowledge of what any single outcome will be. Heads or tails each come with 
probability 1/2. The main theme of this chapter will be to answer the following two 
summarizing questions, once asked by Marian von Smoluchowski [3]: 


1. How can it be that the effect of chance is computable, i.e., that random causes 
can have lawlike effects? 


' An example of a physical law based on probability is provided by the so-called spontaneous 
collapse theory (or GRW theory). Here Schrédinger’s equation is replaced by a jump process (or 
diffusion) equation, in which a macroscopic superposition of wave packets reduces quickly under 
the time evolution to a collapsed wave function, and only one of the packets remains — a possi- 
ble remedy for the measurement problem of quantum mechanics [1], which may also relieve the 
tension between relativity and the nonlocality of quantum mechanics [2]. 

2 Wie ist es méglich, da sich der Effekt des Zufalls berechnen lasse, dali also zufdillige Ursachen 
gesetzmdpige Wirkungen haben? 
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2. How can randomness arise if all events are reducible to deterministic laws of 
nature? Or in other words, how can lawlike causes have random effects?? 


When correctly viewed, the answers are not difficult to find. Indeed, the answer to 
both is typicality, and this answer has in principle been known since the work of 
Ludwig Boltzmann (1844-1906). We say “correctly viewed” because, for various 
reasons, Boltzmann’s probabilistic reasoning did not meet with the same acceptance 
as the calculus of Newton and Leibniz, for example. 

The various reasons are difficult to sort out and this is certainly not the place 
to speculate about historical shortcomings. Some mathematical objections to Boltz- 
mann’s work (physically not serious at all) were clearly much overrated, and cast 
their shadows well into the twentieth century. The general idea behind Boltzmann’s 
work, namely how to reason probabilistically within deterministic physics, was 
overshadowed by Boltzmann’s explanation of irreversibility, the part of his work 
which became most famous and was most strongly attacked. However, the main 
attack against Boltzmann — which presumably hit more painfully than any tech- 
nical bickering — was that Boltzmann reopened the pre-Socratic atomistic view of 
the world advocated by Democritus, Leucippus, and others — now with the rigor of 
mathematical language, reducing all phenomena to the motion of the unseen atoms. 
Boltzmann’s contemporaries could easily dismiss this world view, which was es- 
sential to all of Boltzmann’s thinking, by declaring: We do not believe in what we 
do not see. But for Boltzmann, as for the pre-Socratic physicists, what counted was 
the explanatory value, the gain of insight provided by atomism, explaining logically 
from few basic principles what we do experience with our gross senses — so never 
mind whether we can “see” atoms. 

The attacks against Boltzmann’s reductionism ended immediately with Einstein’s 
and Smoluchowski’s work on Brownian motion, the erratic motion of microscopi- 
cally small particles suspended in a fluid, which are nevertheless visible through a 
microscope. This motion had been observed since the invention of the microscope in 
the 17th century, but it was Einstein and Smoluchowski who, using Boltzmann’s re- 
ductionistic view, explained the erratic motion as arising from molecular collisions 
of the molecules in the fluid with the Brownian particles (see Sect. 5). The atoms in 
heated motion were visible after all. Boltzmann committed suicide in 1906 in Duino 
near Trieste, Italy. 

We shall spend much time on this subject, and Boltzmann’s view of it, because 
we shall apply the ideas to quantum mechanics, where probability commonly enters 
as an axiom. To see whether that axiom is sensible or not, it is good to understand 
first what probability is. 


3 Wie kann der Zufall entstehen, wenn alles Geschehen nur auf regelmaBige Naturgesetze 
zurtickzufiihren ist? Oder mit anderen Worten: Wie konnen gesetzmdpige Ursachen eine zufaillige 
Wirkung haben? 
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4.1 Typicality 


The answer to the question: “What is geometry” is rather simple. It is the doctrine of 
spatial extension and its elements are objects like points or straight lines. The answer 
to the question: “What is analysis” is also simple, but less intuitive, since it involves 
the doctrine of limiting processes and the handling of infinity. The elements are the 
continuum and the real numbers. What is a real number? A real number between 0 
and | is an infinite sequence consisting of digits 0,1,2,... ,9 like 0.0564319.... In 
the binary representation, one only has the digits 0 and 1, and every number between 
0 and | is a sequence of Os and Is, e.g., 


1 1 1 1 
0.1001...=1x eT rages ie 

What is probability theory? The doctrine of chance or randomness? But what are 
chance and randomness? They are what probability theory is about. Simply playing 
with different words is not helpful. What characterizations of chance can be found? 
Unpredictability, not knowing what will happen? As Henri Poincaré (1854-1912) 
once said [4]: “If we knew all, probability would not exist.” But knowledge and ig- 
norance are very complex notions and certainly not comparable to the primitivity of 
a straight line or a real number, and one cannot imagine how these complex notions 
could be seen as fundamental objects of a theory of probability. Most importantly, 
however, we wish to apply probability theory to a physical system, and its behavior 
obviously has nothing to do with what we know or do not know about the system. 
Yet physical systems do often behave in a random way (the famous erratic Brownian 
motion, for example) and this is what we wish to understand. 

So what is probability theory? It is the doctrine of typical behavior (of physi- 
cal systems). The meaning of typicality is easy to understand. Typically one draws 
blanks in a lottery, because there are so many more blanks than prizes, as can be 
ascertained by simple counting. Typically, in a run of N coin tosses, one obtains an 
irregular sequence of heads and tails, which is nevertheless regular in the numbers 
of heads and tails. In fact, these numbers are more or less equal if N is large enough. 
Why is this? It is because, for large N, the number of head-tail sequences of length 
N (which equals the number of sequences of Os and Is of length N) with more or 
less equal numbers of heads and tails is so enormously bigger than the number of 
sequences that have more heads than tails, or vice versa (“allowed” fluctuations are 
of the order of \/N). In fact the number of all sequences is 2 and the number of 
sequences with exactly K heads is given by the binomial coefficient Gy. For large 
N, one can use Stirling’s formula 


w-()vrfiee(Q] ws 


to evaluate the binomial coefficient, and a quick estimate shows that the maximum 
number is 
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Fare 
N/2) /2nN~ 


But summing over all possible sequences, we have 


(x) 7 i 


so that the numbers of sequences with K such that |K —N/2| < VN already sum up 
roughly to the total number 2”. This is essentially the VN fluctuation law, and one 
can imagine that for huge N the difference in numbers is exponential if K exceeds 
the size of the fluctuation (an easy exercise). 

What about typicality in real physics? Consider a system of gas molecules in a 
container. Let us assume that, at this moment, the gas molecules fill the container 
in a homogeneous way. In our gross way of looking at things, we do not see the 
individual gas molecules, but we experience a homogeneous density of the gas in 
the container. We experience a macrostate of the gas. The detailed state of the gas, 
which lists all positions and momenta of all gas molecules, i.e., the phase space 
point of the gas system, is called a microstate. The microstate obviously changes 
with time, since the phase point wanders through phase space (see Fig. 2.3). But in 
our gross way of looking at things, we do not see that in the macrostate. So why 
does the macrostate remain? For the macrostate to change in a detectable way, the 
microstate has to wander into very particular regions of phase space. Why does 
the phase point (the microstate) not move into that particular region of phase space 
where, for example, all molecules are in the right half of the container? That we 
would of course feel with our gross senses. Suppose the container were a lecture 
hall and the air molecules all went into the right half of the lecture hall. Then all the 
students in the left half of the hall would be without air (see Fig. 4.1). So why are 
we lucky enough never to experience that? The answer is that it is atypical. 

It is atypical because the number of microstates making up a macrostate which 
looks like the homogeneous density gas is so overwhelmingly larger than the num- 
ber of microstates which make up a macrostate with a gross imbalance in the 
density profile. To get a quick feel for the sizes, number all the gas molecules 
1,...,N, where N ~ 10*4, and distribute R's (the molecule is on the right) and Ls 
(the molecule is on the left) over the gas molecules. Then it is again a question of 
counting 0-1 sequences of length N with N huge. In fact, these numbers give good 
estimates of the phase space volume of phase points corresponding to equipartition 
and non-equipartition of molecules. 

This way of looking at probability is due to the physicist Ludwig Boltzmann 
(1844-1906), who perfected the kinetic gas theory of Rudolf Clausius (1822-1888) 
and James Clerk Maxwell (1831-1879). According to Boltzmann, what is happen- 
ing is the typical scenario under the given constraints. Boltzmann’s statistical rea- 
soning is superior to physical theories, in the sense that it does not depend on the 
particular physical law. In the following we consider Newtonian mechanics in order 
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gas system 


phase space 


= | typical 


atypical 


atypical 


Fig. 4.1 There are 6 ways of distributing 4 objects between two containers so that there are 2 
objects in each container. There is only | possibility for a 4-0 distribution. This difference in 
numbers (6 versus 1) grows extremely fast with the number of objects, like the gas molecules in the 
gas system. The typical phase points make up most of the phase space (light grey). Atypical phase 
points (corresponding to configurations where all molecules are in the left half of the container) 
make up only a very tiny fraction of the phase space (the darker edges). In fact, the region of 
atypicality is not concentrated as depicted, but is mixed up with the region of typicality as depicted 
in Fig. 4.9 


to establish a connection with common notions, and later we shall apply the ideas 
to Bohmian mechanics to explain where quantum probability comes from. 

Note in passing that the word “typical” is synonymous with “overwhelmingly 
many”.* However, if one takes counting to heart, there is a slight difficulty here. 
The number of microstates of a gas system is uncountable, like the number of real 
numbers, since the positions and velocities vary over a continuum of values. In our 
example of air filling either half or the whole room, it is clear that the numbers of 
microstates are in both cases uncountable. How can we tell what is more and what is 
less in that case? We need a generalisation of counting, and that generalisation is the 
“size” or “volume” of the set of microstates. The mathematical doctrine of modern 
probability theory is based on that idea. Probability theory uses a natural notion of 
the content of sets, called a probability measure, where the emphasis should be put 
on measure rather than probability. Overwhelmingly many phase points will then 
mean that the relevant phase points make up a set of very large measure. 

All this will be formalized later. For the moment it is more important to realize 
that something objective is being said about the physics of systems, when one says 
that a system behaves typically. It behaves in the way that the trajectories for the 
overwhelming majority of initial conditions of the system behave. Because of this 
we can make predictions for complex systems without the need to compute detailed 
trajectories. And it is exactly that role that chance plays in physics. 


4 Instead of “overwhelmingly many”, Boltzmann also used the notion of “most probable”. But the 
Ehrenfests [5], who described Boltzmann’s ideas, used only the notion of “overwhelmingly many”. 
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4.1.1 Typical Behavior. The Law of Large Numbers 


Shortly before his death, the physicist Marian von Smoluchowski>? (1872-1917) 
wrote the article [3], from which we quoted the two introductory questions above: 


1. How can it be, that the effect of chance is computable, i.e., that random causes 
can have lawlike effects? 

2. How can randomness arise if all events are reducible to deterministic laws of 
nature? Or in other words, how can lawlike causes have random effects? 


Smoluchowski develops the answer to the first question. He does not answer the 
second question, and we shall answer it in the sense of Boltzmann. 

Starting with the first question, let us dwell a while on what he meant by lawlike 
effects. In fact, he was referring to the law of the empirical mean, or in modern 
terms, the law of large numbers. The chance which can be calculated reveals itself 
in the predicted relative frequencies, i.e., in the predicted empirical means, which 
one obtains in the irregular outcomes of long runs of an experiment like coin tossing. 

Smoluchowski discusses the physical conditions which allow for such a law- 
like irregular sequence of outcomes in simple dynamical situations. Basic to these 
is instability of motion or as Smoluchowski puts it: small cause, big effect. Small 
fluctuations in the initial conditions yield completely different outcomes. But is it 
not surprising that amplification of small fluctuations should be responsible for the 
lawlike behavior of chance? Well, one must not forget that we look for lawlike be- 
havior in a long irregular sequence of outcomes, and what instability can do here 
is to weaken influences of the past outcomes on the future outcomes. So in a cer- 
tain sense, the more random the sequence is, the more simply we find the lawlike 
behavior we are looking for. 

An example is provided by Galton’s board (Francis Galton 1822-1911). A ball 
falls through an arrangement of n rows of nails fixed on a board, where the horizontal 
distance between two nails is only slightly bigger than the diameter of the ball (see 
Fig. 4.2). At the bottom of the board, one has 1 boxes, numbered from | to n, and 
every box marks a possible ending position of the ball. A ball ends its run in one of 
the boxes. After many runs, one obtains a distribution of the number of balls over 
the boxes. Suppose we let N balls drop, then if N is large, we can more or less say 
what the distribution of the number of balls will look like. The number of balls in 
box m will be about 


In other words, the empirical frequency for balls ending in box m is about (1/2”) (”) : 


The usual argument is of course that there are 2” possible trajectories: R(ight)—L(eft) 
sequences of length n with randomly distributed Rs and Ls, and the ball ends in box 


> Smoluchowski discovered the molecular explanation of Brownian motion independently of Ein- 
stein, thereby opening the door for modern atomism. 
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2 


Fig. 4.2 Galton’s Board. The figure shows the run of one ball 


m if there are exactly m Rs in the sequence. There are (”) such sequences, all se- 
quences are equally probable, and hence (1/2”) ( a ) is the probability of ending up in 
box m. Thus in a series of many runs, by the law of large numbers, we get the relative 
frequencies as described. Equivalently, we can argue that the L—R decisions are in- 
dependent, each occurring with probability 1/2, and then ask what is the probability 
for m Rs in n independent identically distributed Bernoulli trials. These arguments 
are all quite accurate, but they are not the end of the story. The arguments use words 
like randomness, probability, and independence and it is the meaning of these words 
that we need to understand. What exactly is the status of the above argument? What 
exactly do the words mean? 

Let us follow Smoluchowski. He assumes that the dynamics of the Galton board 
is Newtonian. Then he observes the following. The ball enters the Galton board 
with a “tiny uncertainty” concerning the position of its center (see Fig. 4.4). That 
means that the ball collides almost but not completely centrally with the first nail, 
so the ball goes either to the right or left nail slit. Suppose it goes to the right. Then 
the ball must pass the right nail slit. Now the whole secret of the physics of the 
Galton board lies in what happens during the passage through a nail slit. In fact, 
during this passage, an enormous number of (slightly inelastic) collisions take place 
between the ball and the two adjacent nails, as happens in a pinball machine! The 
effect of the many collisions is that very tiny changes in the incoming position and 
velocity of the ball, when it enters the slit, lead to drastically different outcomes for 
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Fig. 4.3 (A) Two very close trajectories of a pointlike ball become separated by virtue of the 
collision with the round surface of the nail. (B) The effect is enhanced through the large number 
of collisions. The more collisions take place, the smaller the “initial uncertainty” need be so that 
in the end left or right is “unpredictable” 


the outgoing directions upon leaving the nail slit. After hitting the next nail almost 
centrally, the ball can go to the left or to the right in the following row. Very tiny 
changes in the incoming position and velocity of the ball (entering the slit) result in 
drastically different outcomes: left or right. 

Small cause, big effect! But why is this so? In fact we have a spreading of di- 
rections due to the convex boundaries of the ball and the nail. In Fig. 4.3, the ball 
is idealized as a point and the nail as a fat cylinder. The pointlike ball bounces off 
the cylinder surface according to the rules of elastic collisions (ignoring the fact 
that the collisions are in reality inelastic). We see that two very close initial trajec- 
tories are far apart after so many collisions. In the Galton board, “far apart” means 
that the following nail will be hit with a left-going or right-going bias. One can go 
even further. Suppose we let the two incoming trajectories get closer and closer, 
then left-outgoing may become right-outgoing and then again left-outgoing, then 
right-outgoing again and so on and so forth. Note that the particular shape of the 
initial distribution of displacements from the ideal central position becomes irrel- 
evant, i.e., the particular details of the “initial randomness” with which every ball 
enters the board become irrelevant. 

Let us formalize this a little more with the help of Fig. 4.4, which should be read 
from bottom to top. We look through a looking glass at the small initial uncertainty, 
i.e., the range 6 over which the positions of the centers of the balls vary when the 
ball enters the hopper (this imprecision of the Galton board machine will be further 
scrutinized later). 6 should be thought of as both the interval and its length. Let us 
assume for simplicity that the outlet of the hopper is also a nail slit. That way we 
have a self-similar picture at each row, which allows us to formalize the spreading 
in an idealized way. We do this in the following list: 
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Fig. 4.4 Looking at the “initial randomness” of the Galton board through a magnifying glass. 
The initial uncertainty of the positions of the centers of the balls lies within 6, which denotes 
here an interval and also its length. 6 is partitioned into cells by the functions X; : 6 — {R,L}, 
k =1,2,...,n, of the left or right moves of the ball when it hits a nail in row k. The partitions mix 
together in a perfect way. The shape of the distribution p of the points in 6 has no influence on the 
outcomes X;, because of the “spreading” character of the dynamics 


1. Simplifying assumption. The trajectory of a ball through the Galton board is 
completely determined by the initial position of the center of the ball (with re- 
spect to the symmetry axis of the hopper) which may vary in 6. This means that 
every left-right direction X;, k = 1,2,... ,n, upon hitting a nail of the kth row is 
a function of 6: 


X,:6 > {O=L,1=R}. 


In particular then, the end position Y = 7_, X; of a ball is a function of 6. 

2. X; partitions 6 into L-R cells. On points in an L cell X;, = 0 = L, and likewise 
for R cells. In other words the cells are the pre-images X, '(R) and X,'(L) of 
X;. The X; are coarse-graining functions of the interval 6. 

3. The partition of 6 is very fine. Figure 4.4 shows X, '(R),X, '(L), k = 1,2. The 
partition by Xj is already very fine. The instability of the motion now “spreads” 
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the cells of X, ' further, i.e., it separates the points of every cell into L-R cells for 
X. Here there might be a source of confusion which we must be clear about. The 
X, are coarse-graining functions, i.e., in this case they map cells (a continuum!) 
onto single points. The fine map which is given by the trajectories of the ball from 
the actual set of initial conditions, before entering the next nail slit, to new sets of 
initial conditions for entering the following slit is the one expanding in the sense 
of Fig. 4.3. This expansion is now encoded by the coarse-graining functions X, 
in the finer and finer partitioning of the initial set 6 into cells. 

4. This mixing partitioning of the cells continues at every stage k. 

5. The total size (added lengths of all intervals) of the L cells (R cells) is indepen- 
dent of the stage k and equals roughly 6/2. For d € {0, 1}, let |X, '(dc)| denote 
the length of the 6, cells. Then from the foregoing, and looking at Fig. 4.4, we 
may expect the mixing character of the partitioning to yield the following (ap- 


proximate) equality 
5\/ 
x{—]). 4.2 
(5). a 


This actually says that the coarse-graining functions form a family of indepen- 
dent random variables, with respect to the measure given by the interval length, 
or in view of point 6 below, with respect to any “decent” notion of measure for 
the intervals. Note in passing that random variables are nothing but, and only, 

coarse-graining functions, or if one prefers, coarse-graining variables! 

6. Any “reasonable” weight p of the points of 6 — the initial randomness, which 
we shall address below — will give the same picture. Intuitively this means that 
the details of this initial randomness are unimportant for the results of the Galton 
board. And this is as it should be, otherwise the results would not be stable under 
repairs to the board (which might be in order from time to time). 

7. Theoretically (ignoring friction), the Galton board could be very large, i.e., the 
number of rows could be as large as we wish, and thus ideally the partitioning 
goes on forever. 


i 
Xe" (Sey) O--Xp,*(Sy)| © TT Xe, (Sen) 
n=1 


Item 7 above asserts that the family of X;,, k = 1,2,..., can be extended to arbitrary 
length. But can it in fact? How can one be sure? Of course, we can be sure. At least, 
nowadays we can, but at the beginning of the 20th century, this was a burning ques- 
tion: Do coarse-graining functions X;, k = 1,2,..., on an interval exist, partitioning 
the interval into finer and finer sets with an ideal mixing as we have described, going 
on forever? The answer lies at the heart of probability theory. The existence of such 
functions leads to the mathematical formulation of probability theory (as we teach it 
today). Their existence is intimately connected with the existence of real numbers, 
and we shall present probability theory in this genetic way. 

Anticipating later sections, we shall already say what the functions are, namely 
the Rademacher functions (Hans Rademacher 1892-1962) 


r.: [0,1] —> {0,1}, (4.3) 
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which are simply the coordinate maps of the binary sequence representing the real 
number 


x€(0,1], «= x Aaya. (4.4) 
k=1 


Let us quickly think about how these coarse-graining functions act: 7; maps any 
number x € [0,1/2) to 0 and any x € [1/2,1] to 1. Hence 7; partitions the interval 
(0, 1] into two equal parts. rp then partitions [0,1/2) into [0,1/4) and [1/4,1/2) 
where all x € [0, 1/4) are mapped to 0 and all x € [1/4,1/2) to 1. And so on and so 
forth. We see the mixing. 

It is also useful to see the following analogy. Suppose we know that a number 
in [0,1] starts with 0.0000. This means the number is between 0 and 1/2* = 1/16. 
What can be concluded for the next digit? Nothing. It is either zero or one. Sup- 
pose it is also zero. Then the number is smaller than 1/32, but we cannot conclude 
anything for the next digit. This goes on in the same way ad infinitum. And this 
corresponds perfectly to the way the ball runs through the nail slits. In particular, 
whatever direction it took before, nothing can be learned from that about the new 
direction. And nothing more detailed can be learned for the next ball’s trajectory, 
no matter how close its initial condition is to the previous ball (as long as it is not 
identical), and so it will move “unpredictably”. 

Smoluchowski thus describes in this early paper what became known and cel- 
ebrated much later as chaotic behavior. Following his reasoning, we have argued 
intuitively that it is very reasonable to assume that the left-right moves at each 
stage are statistically independent. In fact all we shall do later, when making this 
precise, is to define independence of (identically distributed) random variables by 
requiring the perfect mixing of partitions which are paradigmatically given by the 
Rademacher functions. 

So we have argued that the probability computation of m hits in 7 trials seems 
appropriate, and on its basis we can predict the numbers of balls in the boxes by the 
law of large numbers. One should reflect for a moment on the law of large numbers. 
How does it read in fact, if we are to take the microscopic analysis and the coarse- 
graining functions seriously? We shall say more on that later! 

That is as far as Smoluchowski went. We understand more or less how random 
causes (the initial uncertainty 6) can have lawlike effects. Because of the expansion 
due to instability, we have something like statistical independence (no matter exactly 
what the “initial uncertainty” is) and the law is this: the relative frequency of balls 
in box m is typically (”) /2”, 

But we must still face Smoluchowski’s second question: Where does the random 
cause come from in the first place? What justifies the initial randomness? Let us 
be sure that we understand this question correctly. The first thought should be this. 
Suppose the first ball ends in box number 4. Why does the second ball not end up 
in the same box number 4, and in fact why do all the other balls not end up in 
the same box number 4? Impossible? Why? Think of a Galton board machine — a 
machine built with the precision of a Swiss clock, in which the mechanics is such 
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Fig. 4.5 A Galton board machine. We have a well insulated container at temperature T with a huge 
number of balls flying around in the container. These collide elastically with one another and with 
the inner walls of the container. Off and on, a ball drops through the opening and into the hopper 
of the Galton board. What are the predictions for the distributions of balls over the boxes at the 
bottom? 


that balls are taken and sent with Swiss-made precision into the board. All the balls 
have exactly the same initial data as the first ball and all balls end up in the same 
box the first ball ended up in. That is clear. That is one way to always have the 
balls in the same box. There may of course be other ways to ensure that the balls 
are always in the same box. Now you say that in the real experiment there is some 
initial randomness. That is true, but why is the initial randomness of such a kind that 
not all balls end up in the same box? 

To get a better grasp of this problem, we build another machine (see Fig. 4.5). A 
container of balls is kept at temperature T by a huge reservoir which is isolated from 
the rest of the world. The balls collide elastically with one another and with the walls 
of the container, and off and on a ball drops through the opening (slightly bigger 
than the diameter of a ball) at the bottom and drops through a somewhat bigger tube 
into the Galton board hopper. The container contains a huge number H of balls, a 
fraction of which makes up the large number WN of balls which drop through the hole 
into the board. Now we have a machine which has all its randomness “inside” and 
we ask: What can we now assert about the relative frequency of balls ending up in 
the n boxes? 
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Let us be careful about this. What we have at some initial time (say the moment 
before the hole in the bottom of the container is opened) is one configuration, or bet- 
ter, one phase point @ = (qi,...,QH,V1,--.,V#), where the qy are the centers of the 
balls, taking values in the space occupied by the inner part of the container (ignoring 
for simplicity the fact that they have a nonzero extension and cannot overlap), and 
where the velocities can take arbitrary values. The collection of all possible @ is 
the phase space Q. There is no longer any 6, so what are the X;, we had previously 
now functions of? We assume as before that the nails are infinitely heavy, i.e., that 
they do not move at all (in reality, the nails will oscillate and we will have material 
wear under collisions). Then the X; must now depend on @! But we need to be even 
more careful! Numbering the balls which drop through the board, we now have for 
the pth ball functions X?, k = 1,...,n. Its end position after leaving the board is 
described by the coarse-graining function Y? : Q — {0,1,... ,n} given by the sum 


If N balls drop through the board we have Y?(@) = Yi_, X?(@), p = 1,...,N. 
What now describes the distribution of balls over the boxes, i.e., what is the rela- 
tive frequency we wish to predict? For that we need only write down the empirical 
distribution for end positions: 


1 N 
Pemp(®*) = 7 Di Xj (¥?(@)), XE {1,- om}, (4.5) 


p=1 


where X +} (y) is the indicator function, equal to 1 if y = x and zero otherwise. The 
empirical distribution is also a coarse-graining function taking on discrete fractional 
values. What is now a prediction for this function? If we can convince ourselves that 
the coarse-graining functions Y?(q@) are stochastically independent (we must con- 
vince ourselves that they produce a Rademacher-like mixing partition, but instead of 
the interval 6, it is now the huge space Q which is partitioned), we can prove a law 
of large numbers as follows. When N is large, the empirical distribution Parip Os) 
is typically the distribution we expect, namely 


1 /n 
N ~ 
Pemp(@,™) “— Qn (*) . 


“Typically” means that the values are obtained for the overwhelming majority of 
qs. But there is a catch. For stochastic independence we need a measure on Q, 
measuring the sizes of the cells of the coarse-graining functions, and we need that 
measure to tell us what “typically” or equivalently “overwhelmingly many” means. 
Hence the assertion — the typicality assertion or law of large numbers — is made with 
respect to some measure, which will be the analogue of the length of an interval or 
the density p on 6, which we discussed in connection with Fig. 4.4. The appropri- 
ate measure will be discussed later — let us simply call it P? for the moment. For 
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concreteness we now write down the assertion of the law of large numbers in the 
familiar formal way, i.e., for all € > 0 and all 7 > 0, and for N large enough, 


Pr (fa Pao oom) — >5(")| < eh) Sia, (4.6) 
1 


So “typically” means that the set of ws, i.e., the set of initial phase points of the balls 
in the container which lead to the observed relative frequency of balls in the boxes 
has overwhelmingly large Pr measure. In statistical physics, P7; will commonly be 
assumed quite simply to be of such and such a form. This is called a statistical 
hypothesis, which must be argued for, a point we shall take up later. 

Now let us be clear about what all this means for the experiment. When we start 
the experiment we have in the container one and only one configuration of balls, 
i.e., one phase space point @ in the phase space Q of the container system and not 
a distribution of phase space points. This phase point will produce the (1/2”)(”) 
relative frequency of balls in the boxes. Why? Because overwhelmingly many such 
phase points, i.e., the typical phase point, would do that. The role of the statistical 
hypothesis is nothing other than to define typicality. 

What about the proof? Now, if it were the case that the Y’ partitioned the phase 
space Q in the mixing kind of way, and also that the x partitioned Q in an equal- 
sized-cell kind of way (where size is now determined by the measure P7), then the 
law of large numbers would be immediate. In the formalized language of probability 
theory, the conditions simply mean independence of the random variables. We shall 
recall in the probability section the trivial proof of the law of large numbers when 
independence holds. This does not mean, however, that the law of large numbers is 
trivial. By no means! What is not at all trivial, better, what is outrageously difficult, 
is to prove that the conditions hold true, namely that the coarse-graining functions 
partition in just the right mixing kind of way. Establishing this is so exceedingly 
difficult that the proof of the law of large numbers (4.6) for the realistic Galton 
board is practically impossible. 

One moral of the story is this. Independence is easily said and easily modeled, but 
extremely hard to establish for any realistic situation. This helps to explain the role 
and meaning of probability in physics. That is, one must first reduce probabilistic 
statements to their ultimate physical basis. Doing this, probability seems to go away, 
since one then deals with a purely analytical statement, just as in (4.6): the law of 
large numbers establishing what typical frequencies look like. Much of the confusion 
about probability arises because the true depth of the law of large numbers as an 
extremely hard analytical assertion is not appreciated at all. 

We are almost done now. Only two questions remain: What is and what justi- 
fies Py? In a loose manner of speaking, reintroducing probabilities, one could ask: 
Where does this “initial randomness” come from? This loose manner of speaking is 


where 
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easily misleading, as it suggests that the law of large numbers is a probabilistic state- 
ment and P7 is a probability. Up until now, our presentation has nowhere made use 
of such an interpretation, and it will not do so. After this warning let us repeat the 
question: Where does the randomness come from? The most common answer goes 
like this. Systems like the one we constructed in Fig. 4.5 must be built, so they were 
not always closed, and there is some randomness coming from the outside. This is 
true. Someone must have built the machine and put the balls in the container, and 
someone must have heated up the container, or even mixed the balls up by shaking 
it, so randomness does seem to come from outside. But then take the larger system 
including the builder of the machine and whatever it requires. The typicality asser- 
tion is now shifted to a much larger phase space with a new measure Pgs (where 
E-+S stands for environment plus system), and new coarse-graining functions on 
that larger phase space. 

Thus we see that the question remains, only it is now for a larger system. What 
accounts for that randomness? Is it once again the outside of that larger system? 
Well then we go on and include the new outside, and we see that there seems to be 
no end to that. But there is an end! The end comes when we consider the largest clo- 
sed system possible, the universe itself. For the universe the question now becomes 
rather embarrassing. There is no outside to which the source of randomness could 
be shifted. In fact, we have now arrived at the heart of the second of Smoluchowski’s 
questions: How can randomness arise if all events are reducible to deterministic laws 
of nature? Or in other words, how can lawlike causes have random effects? 


4.1.2 Statistical Hypothesis and Its Justification 


We come now to Boltzmann’s insight. What is the source of randomness when we 
have shifted the possible source to ever larger systems, until we reach the universe 
itself? Let us make two remarks. Firstly, if one thinks of randomness as probability, 
where probability is thought of as a notion which is based on relative frequencies in 
ensembles, as many people do, then that thought must end here. There is only one 
universe, our own, and sampling over an ensemble of universes gives no explanation 
for the occurrence of probability in our universe. Secondly, we did put up a big 
show, dramatically extending the question of the randomness from the little system 
to the whole universe in which the little system is only a part, as if that escape to 
the environment had any meaning. The question is, however, simple and direct. The 
physical laws are deterministic. How can there be randomness? 

The answer can be made clear if we revisit the Galton board machine. After 
all, that could be a universe. In a typical universe we can have regular empirical 
frequencies, just the way we experience them in experiments, although the universe 
evolves deterministically. In a typical universe, things may look random, but they 
are not. In Boltzmann’s words, for the overwhelming majority of universes it is true 
that, in ensembles of subsystems of a universe, the regular statistical patterns occur 
as we experience them. 
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So there is something to prove, namely, a law of large numbers for the universe. 
For example, for the Galton board machine as part of our universe, one would have 
to prove that in a typical universe® the balls fall into the boxes with just the right 
frequency. Such an assertion is usually called a prediction of the theory. Which mea- 
sure defines a typical universe? We shall come to that! It is of course impracticable 
in proofs to have to deal with the whole universe, and one would therefore first see 
whether one could infer from the measure, which defines typicality, a measure of 
typicality for a subsystem, like the Galton board machine. That measure would be 
Pr. 

The introduction of such measures of typicality for subsystems of the universe 
is due to Boltzmann and Willard Gibbs (1839-1903). Gibbs preferred to call the 
measure of typicality an ensemble. After all, for subsystems of the universe, we can 
imagine having lots of copies of them, i.e., we can imagine many identical Galton 
board machines, so we can imagine having a true ensemble and we can sample, in 
a typical universe, the empirical distribution of ws. Then P7 would arise as relative 
frequency itself.’ However, this understanding is hopeless when it comes to the 
measure of typicality for the universe. It is therefore best to take the name “Gibbs 
ensemble” as synonymous with “measure of typicality”. In the end it all reduces to 
one typical universe, as it must, since we have only one universe at our disposal, 
namely the one we live in. The best we can do is to prove an analytical statement 
about statistical regularities for ensembles of subsystems within a typical universe. 
A typical universe shows statistical regularities as we perceive them in a long run of 
coin tosses. It looks as if objective chance is at work, while in truth it is not. There 
is no chance. That is the basis of mathematical probability theory. 

Is that all there is to this? Yes, more or less. We did answer Smoluchowski’s sec- 
ond question, but there is a price to pay. The price is that we must prove something 
that is exceedingly difficult to prove and moreover we must answer the following 
question: What measure tells us which sets of the phase space of the universe are 
overwhelmingly large and which sets are small? In other words: Which measure 
defines the typical universe? 

Now at first glance the measure of typicality does not seem to be unique. Recall 
that in our discussion of the Galton board, the distribution of points in 6 (which 
could be such a measure of typicality) can be rather arbitrary as long as it does not 
vary on the very small cell scale. That holds for all measures of typicality. Typicality 
is really an equivalence class notion. All measures in a certain equivalence class 
(absolute continuity of measures would be an ideal class) define the same notion of 


6 Typicality with a grain of salt. A universe in which a Galton board experiment takes place might 
itself not be typical. We must eventually understand typicality conditionally, conditioned by macro- 
scopic constraints which govern our universe, for example. This issue will resurface in Sect. 4.2, 
and when we justify Born’s statistical law for the wave function in Chap. 11. 

7 There is a danger when too much emphasis is put on ensembles and distributions over ensembles. 
One is easily led astray and forgets that, for the molecules in a gas, while moving erratically around, 
one has at every instant of time one and only one configuration of gas molecules. The question is: 
What is the typical configuration? 
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typicality. Nevertheless there is a particularly nice representative of the equivalence 
class which we would like to introduce next. 

The measures we are talking about are a kind of volume for the subsets of the 
tremendously high-dimensional phase space of the universe. Let us think of volume 
as a generalization of the volume of a ball or a cube to very high-dimensional phase 
spaces. Let us call the measure on phase space P. What is a good P? Let us recall 
what role it has to play. It must define typicality, but it must do so in the form of 
the law of large numbers for empirical distributions. That is, we must be able to 
consider ensembles of subsystems of the universe now, tomorrow, and at any time 
in principle, and it is intuitively clear that one particularly nice property of such a 
measure would be technically very welcome: it should not change with time, so that 
what is typical now is also manifestly typical at any other time. 

The simplest requirement for that is that the volume measure we are looking for 
should not change with time, i.e., it must be a stationary measure. Apart from the 
technical advantage when proving things, the notion of typicality being timeless is 
appealing in itself. Time-independent typicality is determined by the physical law, 
i.e., time-independent typicality is given to us by physics. To see this more clearly, 
recall our discussion about stationarity in Remark 2.2 on p. 19. For a Hamiltonian 
universe, we have Liouville’s theorem (2.12), which asserts that the phase space 
volume does not change with time under the Hamiltonian flow. The law of physics 
gives us a physical notion of typicality based on time independence. It is as reason- 
able to take that notion as relevant for our understanding of the universe as the law 
itself. 

It is natural to guess that the statistical hypothesis to define typicality for sub- 
systems should also appeal to stationarity. Boltzmann felt this way and, indepen- 
dently of Gibbs, introduced the measures which are known nowadays as the canon- 
ical ensembles or Gibbs ensembles. Indeed, the mathematical physicist Gibbs saw 
stationarity as a good requirement for a statistical hypothesis on which he based 
an axiomatic framework of statistical mechanics for subsystems. Gibbs’ axiomatic 
framework is immediately applicable to the thermostatics (equilibrium thermody- 
namics) of subsystems, and explains his success, while Boltzmann’s work was less 
widely recognized. 

As already mentioned, Gibbs talks about distributions over ensembles and is not 
concerned with the actual phase point the system occupies. On that basis, he gave a 
justification for the use of Gibbs ensembles that is neither necessary nor sufficient, 
connected to the so-called mixing and convergence of non-equilibrium measures to 
equilibrium measures (see the next section). This is not necessary because typicality 
does the job, and it is not sufficient, because it is not linked to the actual trajectory of 
the system under consideration. A system is always in a particular configuration and 
never “in a probability distribution”. Gibbs’ view seems to deviate from typicality 
and Boltzmann’s view of the world.’ Historically Gibbs’ view seems persistent, 
while Boltzmann’s understanding that the role of the hypothetical Gibbs ensembles 
is to define typicality was lost. 


8 The Ehrenfests criticize Gibbs’ view correctly [5]. 
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We now look at some stationary measures relevant in Hamiltonian subsystems. 
We do that to sharpen our intuition for the quantum equilibrium analysis we shall 
undertake once we have introduced Bohmian mechanics. The analysis there is much 
simpler than what we do next. The important message is, however, that the basic 
structures are the same, something which is not recognized in textbooks. 


4.1.3 Typicality in Subsystems: 
Microcanonical and Canonical Ensembles 


Recall (2.26), viz., 


7] 

9, P%#) = —v'"(x,t)-Vp (x,t) : 

We look for stationary solutions of this equation, i.e., we search for a density p for 
which the time derivative on the left-hand side is zero. According to Remark 2.2, 
a stationary density yields a stationary measure. Now the right-hand side of the 
equation is 


vt.Vp = 


dH 0 OH oO 
Op Oq oq op 


d 
= (az +b3) p(a.P) = =p (al), P(?)) ; 
which is the change in the function p along the system trajectories. When we ask 
for this to be zero, we ask for functions which are constant along trajectories. One 
such function is H(q,p), which is just energy conservation. Hence every function 
f(q,P) = f(H(q,p)) is conserved. Two functions play a particular role, as we shall 
explain. The simplest is 


p= FH) = 7m 


where f is interpreted thermodynamically as B = 1/kgT with T the temperature, 
and kg is the Boltzmann constant. The latter is a dimensional factor, which allows 
one to relate thermodynamic units and mechanical units. We shall say more on that 
later. Z(B) is the normalization.? The function defined by (4.7) determines a mea- 
sure which is commonly called the canonical ensemble. Its role will become clear 
in a moment. 


(4.7) 


° We wish to define typicality with this measure and we wish to call the typical (i.e., the pre- 
dicted) relative frequencies simply probabilities, as physicists normally do. For this purpose it is 
convenient to normalize the typicality measure like a probability measure. 
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The somewhat more complicated density is the following. It arises from the pre- 
served volume measure on phase space, but taking into account the fact that energy 
is conserved. Energy conservation (2.9) partitions the phase space! Q into surfaces 
of constant energy Qe: 


E 


When we think of an isolated system its phase point will remain throughout the time 
evolution on the energy surface defined by the value of the system energy E. Which 
function defines a stationary density on the energy surface Qz ? Formally, this is 
clear: 


pe= Fp Man) —-E), Z(E) = normalization. (4.8) 


This then defines the stationary measure — the so called microcanonical ensemble: 
1 
P(A) = | ——~6(H(q,p)—E)d%qaeXp, ACQz. 4.9 


The 6-function may seem scary for some readers. So let us rewrite the measure, 
introducing the area measure dog on the surface. The point is that (4.8) is in general 
not simply the surface measure (when the energy surface is not a ball), but it has a 
nontrivial density. There are various ways to see this. Consider a surface given im- 
plicitly by f(21,... ,x,) =c and suppose that we can solve for x, = g¢(%1,--- ;Xn—1); 
i.e., we can parameterize the surface by the vector 


y= tagees Xn—158c(X1,--- inst) # 
Then we know from analysis that 


_ |Iveil 
anf 


On the other hand, for any nice function h, by change of variables 


do. 


dx, ...dx,-1 . (4.10) 


Kn —> y= f(1,--- Xn) 5 Xn = By(X15++- %n—1) 5 


we have 


'0 In probability theory, it is common to call the phase space probability space denoted by Q, 
but be aware that this space has nothing to do with probability, randomness, or chance, despite its 
name. Coarse-graining functions of phase space are called random variables, and they too have 
nothing to do with randomness per se. 
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[re Adolf ijexs ,Xn) —c) dx} AP be 
1 

=i... 8y(X1y-++ 5%n-1)) Cee Ce) 5(y —c)dx}.. dxy_1dy 


1 
=e... Xn= ty Se (15+ in) 5 en erOXpay . 
n 


Hence we see that the 6 density acts like a surface integral, and by comparison we 
find that 


1 


6(f —c)dx,...dx, = ——do, . 
pea eile 


(4.11) 


Hence, applying this to our case, for the microcanonical ensemble P given by (4.9), 
we have the result 


1 dog 
Pz(A) = ACQE. 4.12 
2, ||VAl| 


Here is a more pictorial way to arrive at that. Divide the volume element d*" qd*" p 
of Q into the surface element dog on Q¢ and the orthogonal coordinate denoted by 
1, viz., 


PY ga p = dogdl . 


The element on the left is invariant under the Hamiltonian flow, by Liouville’s theo- 
rem. d# is also invariant. This is the /-coordinate difference between the two energy 
surfaces Qg and Qe +a¢ (see Fig. 4.6), so that on the right-hand side, we should write 


dl 
—dH. 
dog dH 


Then 


Fig. 4.6 Transport of phase Bo : ; 
space volume between two y E+dE 


energy surfaces 
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ae dl 
"dH 
must be invariant. It remains to be seen what d//dH is. Now VH is orthogonal to 
Qe, i.e., parallel to the coordinate line /, 


dH = ||VA||d/ , 
and hence 
dol 
dH ||VAl| 


Look again at the Fig. 4.6. The surface measure density 1/||VH|| takes into account 
the fact that the system trajectories moving between two energy shells must move 
faster when the shells get closer, since the phase space volume must be preserved 
by the flow lines. Accordingly, in the case of spherical energy surfaces (||VH|| = 
constant), the surface Liouville measure is simply the surface measure. 

Boltzmann’s and Einstein’s views (Einstein reworked Boltzmann’s ideas for him- 
self) is that the microcanonical measure, which is the projection of the natural Liou- 
ville measure onto the energy surfaces, is the typicality measure for the (Newtonian) 
universe. Why should this be natural? The universe is a closed thermodynamical 
system, since there is no energy exchange with an outside, i.e., the energy of the 
universe does not fluctuate. The distribution (4.7) has energy fluctuations, while 
(4.12) does not. So it is a natural choice. Why does (4.7) appear at all? We shall 
explain that. 

Let us start with the simplest thermodynamical system, namely an ideal gas in a 
container. Ideal gas particles do not interact with one another and, assuming com- 
plete isolation, the gas system has only kinetic energy. For N equal mass particles 
with velocities (vj,...,Vn) € R°, the Hamiltonian is 

1 N N 
H(q,p) = 5 my; ==, 
i=l i=1 


and H = E are preserved surfaces under the ideal gas dynamics. The positions of 
the gas molecules are contained in the volume V. The microcanonical measure (4.8) 
for the gas system is then 


1 1 a Ty: 
Pp (d2% gd2% p) = — 2% g—— LS oA aad 4.1 
e(@" qd" P) = Ga 176° Dom P, (4.13) 


where we now set 


ze) = [ 6 yP_e Vp 
RN \ 2m ; 


since for the ideal gas the volume normalization can be simply factorized. 
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Since the positions are uniformly distributed, we focus attention on the momen- 
tum, or what comes to the same up to a constant, the velocity distribution of that 
measure, which we obtain when we integrate all positions over the volume V. Then, 
with a slight abuse of notation, still denoting the normalization factor by Z(E), we 
have 


N vo} 
Pr(av) = 5 D 5 (3 = e) By, (4.14) 


i=l 
The delta distribution forces a complicated dependence among the velocities. The 


Nye Changing vari- 
m 


/2E/ 


velocity vector (v;,...,Vy) lies on the spherical surface S 


m 
(3N) 


on the spherical surface S iE Ta we can express (4.14) in the form 


ables, referring to (4.11), and computing 


N 
lV ov 
i=l 


(3) 
do™/——_ 
_ JA /2E/m Ac SBN) 


|sCr) \/2E/m’ 


Pz(A) ’ 
ETa 


(4.15) 


en) (oN? its surface element. 
2E/ 2E/m 


This is not very informative. Suppose we would like to know the so-called marginal 
distribution of only one velocity component, say (v1), which we obtain by choosing 


BN) |: 
where |S [aT is the size of the surface S 7 and do 


(3N) 


A= ey 


or equivalently by integrating over all velocities in (4.14) except (v1), € [a,b]. 
What would we get? Or suppose we asked for the marginal joint distribution of 
(V1 )x,(¥5)y, then what would we get? 

Clearly, the answer is complicated, since all the velocities depend on each other. 
But intuitively, if the number N is huge (as it is in the gas system, of the order of 
Avogadro’s number), the dependencies must become weak. Better still, if N gets 
large and with it the energy E, so that 


(Vi)x € coi} ; 


we should expect the marginal distributions to attain characteristic forms depending 
on the value c, but independent of N. Before we show this, let us think about c. It 
does look like the empirical average of the single-particle kinetic energy. If the law 
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of large numbers applies, we have 


m m 1 = mvr 
eS 3 (VIEW = / 2"iz¢B° (3 5 e) ayy 


i=1 


C= 


2|m 


=! mf vefelw)ev. 


Here f, denotes the marginal distribution of the velocity of one particle in the sit- 
uation where E/N = c and N is large. This average is well known, and can be 
determined from a completely different view which is basic to all of Boltzmann’s 
work, in fact to the whole of kinetic theory: the connection with thermodynamics! 
The gas in the container obeys the ideal gas law pV = nRT. The way we have for- 
mulated the law, it is already atomistic since it involves the number n of moles in 
the volume V. R is the gas constant p the pressure, V the volume, and T the temper- 
ature. Introducing Avogadro’s number Na, the number of molecules in the mole, we 
obtain 


pV =NkpT , (4.16) 


with Boltzmann’s constant kg = R/Na and N = nNg the total number of gas 
molecules. Boltzmann’s constant should be thought of as the heat capacity of one 
gas particle. This becomes clear when we put the above average in relation with 
the pressure of the gas. In kinetic theory the pressure arises from the gas molecules 
hitting the walls of the gas container: 


force momentum transfer during At 


= : (4.17) 

area At x area 
where Af is a short time interval in which very many molecules collide with the wall 
area. Suppose the area A has normal vector in the x direction. Then the momentum 
transfer is 2mv,, since the particle is elastically reflected. The number of particles 
with x-components of the velocity in the range [v,,v, + dv,] colliding with the area 
A and within Ar is roughly 


N 
N(vx, dvy, At, A) = VxAtA fo(Vx)dvx 7 : 
Consider Fig. 4.7. v,AtA is the volume of the cylinder (Boltzmann’s collision cylin- 
der), in which a particle with this x-component of velocity must be in order to collide 


with the area A in the given time interval. The “probability” for a particle to be in 
that cylinder is 


Felis) , 


where N/V is the spatial density of particles and f.(v,) is the density of the x- 
component of the velocity distribution. We thus obtain the above number denoted by 
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Fig. 4.7 A particle with velocity v must be in the collision cylinder to hit the wall within A and 
within time At. The volume of that cylinder is vy,AAt 


N(v,x, dv, At, A). Actually we do not need to know what f,.(v,) looks like, except that 
we require the symmetry f.(v,) = f-(—v,). Each particle transfers the momentum 
2mvx, SO considering the symmetry and integrating over all relevant v,, we get 


N co 
momentum transfer during At = 2m yaa | V* fe(vx)dvy 
0 


N co 
- mara | v2 fo(v,) dv, . 


Hence, in view of (4.17), we obtain for the pressure 


N co 
p= am | Ve fe(Vx)dvx , 
that is, 
pV => N(mv2) gn 2 


Comparing with (4.16), we see that (mv?) EN = kpT. This important result is called 
the equipartition theorem. The average kinetic energy of a particle is 


=(— = <kpT. 4.18 
. c 7 EN 2 . ( ) 
Why is this result important? The answer is that it connects two physical theories, 
thermodynamics and Newtonian mechanics! Thermodynamics does not know about 


molecules and Newtonian mechanics does not know about temperature. Yet they can 
be connected! And the constant which gets the dimensions right is kg. In fact, kg is 
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as we already said essentially the heat capacity of an atom, since heat capacity is in 
general the proportionality between energy increase and temperature increase. 

We now return to the question of what the marginal distribution f, looks like. 
The answer is surprising! We wish to compute 


dota 
Ja he 


for A= (v1,...,V, jes. (Vi)x € [a,b] > . 
so Val! 


(4.19) 


Let us simplify the notation. Let 


n 
s”) = {ts Xn) ER", ¥x? aa 


be the n-dimensional spherical surface, and let |-| denote the normalized surface 


(n) 


measure on S; ’. Furthermore, let 


n 
Ss” (a,b) = {ta Xn) € R", x7 =Pr,a<x< ok 
i=l 


be the spherical zone defined by the interval (a,b) (see Fig. 4.8). We shall prove the 
following geometrical fact: 


Lemma 4.1. Let 07 > 0 and let a <b € R. Then the following holds: 


ae 7 [ e7* /20° 


dx 4.20 
V 2002 a 


Before proving that let us translate the result back in terms of (4.19). We have n = 
3N, E/N = 3kpT /2, and thus 


2E kpT 
2 a, ect 7 
m m 


y] 


that is, o? = kg/m. Therefore the marginal distribution we are after turns out to 
be the Maxwellian 


lim Pr (A=4(v,...,¥n) €S¢ 
ve f= e( {0 ¥") €5 eT 
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But this is the canonical distribution! It is a straightforward matter, once one under- 
stands the proof of the lemma, to conclude that the marginal distribution for more 
velocity components is the product of the single-component distributions. Further- 
more, as an exercise, compute the mean kinetic energy of the particle! This must be 
3kpT /2. 


Proof of the Lemma. The most demanding step of the proof is to convince oneself 
that 


IS |=r sy". (4.21) 


Intuitively, this is clear if one thinks of the spherical surfaces in 2 and 3 dimensions. 
The straightforward way to get this rigorously is to introduce n-dimensional spher- 
ical coordinates and write down the parameterized surface element, or to simply 
recall the general formula (4.10), which in this case reads 


dot”) = : dx; ...d%,_1- 


a eee 
\ cee 


Now change variables y, = x; /r to get 


)_ rr’! 
rfl-y- vy 


from which (4.21) follows. Assuming |a| < r and |b| < r and recalling (4.11), we 
have 


dy, ...dy,-1 = rag”) : 


b ‘co co 
is! (a,b)| = f dx | dy... | d6275 04 + .ctoe —r) 
b ‘co co 
= [ ax / dy... | dx, 2r6 (x3 ge =F xt)) 
a —°o —oo 


b 1 
= ar f dx; ———-. 
7 2 2 — xt 


b 2 
= arise) f dxv/ r2 2 

a 

b -3 
= rise) f dxvV/ p= : 


Once again using (4.21), we then obtain 


(n-1) 
nae 


2 
re—X7T 


[by (4.11)] 


1 
eae ob 4a) 
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(n) (n—1) b 
|S; ’ (a, b)| _ [St | 1 ie ey? ay 


se) |g Ja 
(n—1) b (n—3)/2 
S 1 2 
=f [| (7)'| dr. 
Is | Ja r 
We now use the fact that the fraction on the left becomes unity for a = —r, b=, 


which implies that the factor in front of the integral on the right is nothing but a 
normalization factor, whence 


b x\2 (n—3)/2 
ao) _f es = 


PLO 


Now comes a tiny piece of analysis. We wish to evaluate the expression for r7 = on 
as n — oo, and for that we wish to pass the limit inside the integral. This would be 
done rigorously by appealing to Lebesgue dominated convergence, observing that 
1—x<e “ and hence 


f 2 ert a  f#a3 
no2 =P no? 2 , 


which is integrable for n > 3. So we need only take the limit on the integrand 


and normalize the Gaussian. Thus the lemma is proven. If one feels uncomfortable 
with 6-functions, one can of course do without and compute the zone as in Fig. 4.8. 

What moral should we draw from this? The measure of typicality for a subsystem 
of a large system, which is typical with respect to the microcanonical ensemble, 
should be the canonical ensemble, if interaction between the subsystem and the 
environment is small. So let us push this a bit further. Let us return to the general 
microcanonical measure (4.8) given by 


1 do, 
5(H(q,p) —E)d°%qd°" p = —_______* 
Q, ||VAl| 


Z(E) 


and recall that for the ideal gas (4.13), where the volume factor V°" is irrelevant, 
and setting n = 3N, 


n—2 
dog QE 
Z(E 
©)= fiat = 2V evel ~ Vw 
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Fig. 4.8 Computing the spherical surface zone of a three-dimensional sphere of radius r, thereby 
recalling the famous identity that the (a,b) zone is the cylindrical surface of the cylinder with base 


rome and its height is dz, 
yielding isi” (dx)| = [Sa alee, and by similarity of the triangles with sides x, r and dx, dz, we 
= 


have the proportionality dz: dx =r: Vr? — x? 


size 27r and height b— a. The base size of the cylinder is now |S 


with E/n = kgT/2, whence Z(E) is a huge exponential! We therefore make it 
smaller by taking the logarithm, 


= 
InZ(E) =" 


InE+ @(n) , 


then take the derivative with respect to E and use 


E_ kpT 


n 2 
to obtain 


dinz(ZE) on 1 1 
dE 2E E kpT’ 


(4.22) 


This result suggests a microscopic definition of Clausius’ entropy S for which the 
thermodynamic relation 


= 
oE T 
will hold. So setting 
S=kgInZ(E) (4.23) 


hits close to home. This is Boltzmann’s setting, and we shall henceforth take this as 
valid in general (not only for the ideal gas).!! 


‘A bit of unimportant history. Boltzmann never wrote this formula down, although it is inscribed 
on his tombstone. Planck wrote the formula (1900), introducing the Boltzmann constant kg in his 
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But we want to focus first on the moral above. We imagine the subsystem to 
consist of N; particles with phase space coordinates 


(q1,P1) = (qi,--- ae eee -') ’ 


and the rest of the large system to consist of N2 particles with phase space coordi- 
nates (q2,p2). So the phase points of the large system are naturally split into two 
coordinates: (q, p) = ((41,P1),(42,P2)). Suppose also that the Hamiltonian agrees 
with the splitting, i.e., H(q,p) © Ai (q1,p1) + Ho(q2, p2), which means that the in- 
teraction energy between the subsystem and the environment is small compared to 
the energies E; and E>. We shall assume that Nz >> N;. Then as before for the ideal 
gas case, we write 


Pe ({((a1,p1)-(42,2))l(a1-P1) €A}) 


1 
= am [eae ar | PN qn dpe 5 (Hi + Ha) 
Z(E) JA 


1 3N. 3N. / 3N: 3N. 
= zm de"! a! de"? god"? 6 Hy -—(E-H 
Z(E) [ qgd'pi god? py 5 (Hy — ( 1)) 


1 
= 7B | S41 pi22(E—Hi(a1,p2)) . 
As in the ideal gas case we wish to control Z)(E — H,)/Z(E) for E/N = constant, 
when WN gets large. Let us quickly check on the ideal gas to see how to proceed. 
Roughly Z(E) ~ EN (actually the exponent is N/2, but we simply call it N to ease 
notation) and Z)(E — H,) ~ (E — H,)*2, where Nz ~ N. Therefore, expanding Z2 
around E, which we think of as being large compared to the range of typical values 
of Hj, yields 


Z(E-Mh) | (E—M)" 
Z(E) | ~—~—O*~EN 


EN NEN, 1N(N—1)E~?H? io 


= gn + EN 2 EN 
N 1 N2 

=1+—M4 r+. 
a oR 


Since E/N is approximately the mean energy of one particle, which is of the order 
of the typical value H achieves, the important observation is now that all terms in 
the above expansion are of order 1. So this expansion is no good. The terms are too 
big. Therefore let us take the logarithm: 


work on black body radiation (see Chap. 6). Einstein used that formula in reverse. Thinking of the 
size of regions in phase space as the probability for a microstate to be in that region, he wrote (we 
shall say more on this below) that the probability of fluctuation ~ exp(Aentropy /kg). 
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Z(E—-H E—H,)% H 
In 2( 1) ~] ( 1) —N]l | ae 
Z(E) EN 
_ nit 1 Hi 
E2sE° ° 


where the second order term (and all following terms) is now of smaller order in 
1/N than the first term. So this is looking better. Expanding in H; yields 


nee = InZ)(E—H) —InZ(E) 
7 Z(E) dlnZ)(E) 
~ FB) oma 
=n) 1 by (4.229). 


Z(E) kpT 


The first term on the right becomes the normalization factor, i.e., we obtain the 
normalized canonical distribution 


Z(E—H) _ eWti/keT 
Z(E) Zi (T) ? 


or in other words 
eA /keT 


Pe({((ai,1), (42,2) (a1 P1) e A}) a [eae ps eA 


= [aap pr(ai.py). 


which is the canonical distribution (4.7). 

Since we have come this far now, let us briefly consider the Gibbs formalism 
to highlight the difference between the Gibbs entropy and the Boltzmann entropy. 
First note that the normalization Z(T) of the canonical measure with Hamiltonian 
H (commonly referred to as the partition function) can be written in the form 


Arye [eo aay eH /keT 


= fetletae [qe p5(E-H(4.0)) 
= [oezeye ter 


Therefore the expectation value of E is 


(E)p = _ pu EZ(E)e E/T —. [ Epr(e)ae (4.24) 
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where 


Z(E) —E/kgr 
BE) Se ie 
pr( ) FAG 
The T-derivative of this is the heat capacity, which is readily computed from the 
definition of Z(T): 
q{E)r _ 
dT kpT? 


1 


C — 
kpT? 


(AE*)r . 


((E°)r—(E)3) = 


Assume for simplicity that the heat capacity is constant, i.e., (E)7 = CT. Then the 
above variance is simply (AE >). = CkgT?, and for the relative variance, we obtain 
the ratio between the heat capacity of an atom and that of the system: 


(AE?)r _ CkpT* _ kp 


ae CrP Ge 


This noteworthy and famous result says that, since C + Nkg when there are N atoms 
in the system, the relative energy fluctuation is negligible when N gets large. This 
means that the typical phase points all lie in a relatively thin shell around the mean 
value E := (E)r. This in turn suggests that typicality is as well expressed by the 
microcanonical ensemble on the surface Q;. This is known as the equivalence of 
ensembles. It can be sharpened to a rigorous statement by taking the thermodynamic 
limit of infinitely large systems N — ce, V/N = constant. 

Furthermore this suggests that it is reasonable to approximate the canonical dis- 
tribution by a Gaussian with mean E := (E)r and variance Ckg T?, i.e., 


_(E-E)* 2 
Z(E) e/g m: (E-E)" /CkgT 


pr(E) = — 
&) Z(T) /2nCkpT2 
Therefore, 
oy _ Z(E) _-E/igr 1 
pr(E) = eB! ; 
) Z(T) /2nCkpT2 
whence 


Z(T) © V/2mCkgT2 Z(E)e E!'s" , 


and therefore 
= E 
InZ(T) + InZ(E) — a +In /20CkpT? . 
B 


Then, using C + N, 
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InZ(T) Nlarge InZ(E) 1 E 
N ~ N kpT N° 


In this sense, 
—kgT InZ(T) = —kgTInZ(E) +E, (4.25) 


and in view of (4.23), this should be identified as the thermodynamic free energy 
relation 


F=-TS+U. 


We now reverse the argument. Take a microcanonically distributed system with en- 
ergy E. Then the Boltzmann entropy (the entropy) is S = kg 1nZ(E). For large sys- 
tems, this value is well approximated by kg In Z(E), with the canonically distributed 
system with (E)r = E = E. By (4.25), we then have 


1 
T 


E= kz linzcr) + (pie 


S=kglnZ(E) = kglnZ(T) + 
kgT 


= —kp / prinprd?’ ga’ p [by (4.24)], 


where we have reverted to the canonical density notation 


— | Hap) /kgr 
pr(@,p) = ZT) : 


The right-hand side of that equation is the famous Gibbs entropy 


Sc = =k f pr Inprd Nga p , 


equal in value to Boltzmann’s entropy (4.23) in equilibrium. Although the two val- 
ues are equal, there is a huge difference between Boltzmann’s entropy and the Gibbs 
entropy, a difference which becomes critical when we turn to the issue of non- 
equilibrium, irreversibility, and the second law. Boltzmann’s entropy is naturally 
a function on phase space. It is constant on the set of typical phase points, where the 
size of the set of typical phase points is essentially Z(E). The Gibbs entropy is not 
defined on phase space, but is rather a functional of distributions on phase space — a 
technically very abstract, but computationally useful tool. 


4.2 Irreversibility 


We understand how randomness enters into physics. It is how a typical universe 
appears. The balls in a Galton board behave in the random way they do because 
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that is how they behave in a typical universe. The gas in a container fills the con- 
tainer in a homogeneous way because that is typical. One thing is strange, however: 
Why should there be a Galton board in a typical universe, why a gas container, and 
when we pick up that stone and throw it, how come we can compute its trajectory 
precisely, without any randomness at all? Are our macroscopic experiences not in 
fact non-random in many ways? Where is the place for all the typicality talk of the 
previous section in a universe like ours? 

Unfortunately, this is a hard question, because the truthful answer is that we do 
not know. Our universe is not typical. For our universe there is no justification for 
typicality of microcanonical or canonical distributions, because they are false in 
general. For us it is typical to generate atypical situations. We can build a container 
with a piston which forces all molecules into one half of the container, and we can 
remove the piston and then we have a gas system where all gas molecules at that 
moment are in one half of the container and the other half is empty (at least we can 
do that in principle), as in Fig. 4.1. That is an atypical state. True, the gas did not do 
that by itself. We did it. But that would not help. Shifting the reason to the outside 
ends in atypicality for the largest system conceivable, the universe. Something needs 
to be explained. 

A typical universe is an equilibrium universe, the equilibrium, to which our uni- 
verse evolves (to a thermal death, as Clausius referred to it). Right now we are still 
very far from thermal death. Our universe is atypical or in non-equilibrium, and 
that is why we can build a Galton board or pick up a stone, or look at the moon 
circling the earth. But at the same time we experience the determined evolution to 
equilibrium at every moment and all over the place via the second law of thermo- 
dynamics. The law determines that thermal processes are directed, and that they are 
irreversible, i.e., the time-reversed process does not take place. The breaking of a 
glass is just such a thermal process — no one ever experienced a broken glass be- 
coming whole again on its own. Never will a cold body make a warm body warmer 
on its own. The law says that thermal processes will always run in such a way that 
entropy increases! 

This thermodynamic law is different from all the other laws of physics, which 
are time-reversal invariant. In thermodynamics there is no argument to back this 
law, no further insight to make the law plausible. It is a law that describes what we 
see. Boltzmann explained the law by reducing thermodynamics to Newtonian me- 
chanics. That seems paradoxical, since Newtonian mechanics is time reversible in 
the sense that all mechanical processes can run both ways. Boltzmann’s explanation 
is this. 


4.2.1 Typicality Within Atypicality 


Think of the atypical state of the gas in Fig. 4.1, where only half the container is oc- 
cupied by gas molecules (suppose this has been achieved by a piston which was then 
removed). That is now an atypical state — no way around that. What happens next 
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with the gas? Typically the gas system will move in such a way that the container 
becomes filled in a homogeneous way and remains in that equilibrium-like state 
“forever”.!? Of course, “typicality” is no longer our clear-cut equilibrium “typical- 
ity”. It is typicality with respect to the measure on phase space which arises from 
conditioning under macroscopic constraints (like the piston which pushed the gas 
into one half of the container). In short, it is typicality under given macroscopic 
constraints. That is what remains of Boltzmann’s typicality idea in our real atypical 
world. Things are as typical as possible under given constraints. 

The typical process of the homogeneous filling of the container when the gas 
is freed from its macroscopic constraints is basic to the famous phenomenological 
Boltzmann equation, an equation which is not time-reversal invariant, but which can 
be derived from the time-reversible microscopic dynamics of gas molecules. The 
Boltzmann equation describes how the typical empirical density (of a constrained 
low density gas) changes in time to a typical density of equilibrium. That is a typical 
lawlike behavior (strongly analogous to the law of large numbers) on which the 
second law is based. The second law is in that sense a macroscopic law, which holds 
for a particular set of atypical initial conditions of the universe. We do not know by 
which fundamental physical law this set is selected, nor do we know exactly what 
the set looks like. All we know is that it is a very special set given by macroscopic 
constraints which we can to some extent infer from our knowledge about the present 
status of our universe (we shall expand a bit on that at the end of the chapter). But 
that should not in any way diminish our respect for Boltzmann’s insight that the 
second law can be reduced to and therefore explained by fundamental microscopic 
laws of physics. 

We shall say a bit more about the second law and entropy increase, and the way 
irreversible behavior can arise from reversible behavior, or in other words, how one 
can turn apples into pears. As we said, the key here is special initial conditions. We 
return to the notions of micro- and macrostate, since entropy is a thermodynamic 
notion. Entropy arises from the first and second laws of thermodynamics (when the 
latter is phrased in terms of the impossibility of a perpetuum mobile) as a thermo- 
dynamic state function .” which reads in differential form 


dS = —(d€+ Pav) , (4.26) 


1 
T 
with energy &, pressure Y, and volume V. 

A thermodynamic state (or macrostate) is determined by a very small number of 
thermodynamic variables: volume, density, temperature, pressure, energy, to name 
a few. According to kinetic gas theory, or in more modern terms, according to sta- 
tistical mechanics, the extensive variables (those which grow with the size of the 
system) are functions on the phase space 2 of the system under consideration, i.e., 
extreme coarse-graining functions (random variables in other words), which do not 
fluctuate, because they come about as empirical means (the law of large numbers 
acting again). An example is the (empirical) density p(x), the average number of 


2 The quotation marks on “forever” will become clear further down. 
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particles per volume: 


N 
p(x)d?x = 7 ¥ 6(x—qi)d*x, (4.27) 
i=1 


where N is the very large particle number of the system and the q; are the actual 
particle positions. The values that this macroscopic p achieves, and which char- 
acterize the macrostate Ma, partition the phase space into large cells, so that very 
many phase points — microstates @ =: Mi(Ma) € Q — realize the macrostate. We 
call the size of the cell the phase space volume of the macrostate: W(Ma). A simple 
mathematical example is provided by the Rademacher functions (4.3). Consider the 
empirical mean 


n 


palx) =~ SYnilx) € (0,1). (4.28) 


Nij=} 


This is certainly a macroscopic variable which coarse-grains the interval [0, 1]. We 
shall prove in (4.2) the rather intuitive assertion that typically 


nlarge 1 

Pn x aa 2 ’ 
so that a very large subset (with respect to the natural notion of size as the “length” 
of the set) of points in [0,1] (microstates) yields the macrostate corresponding to 
value ~ 1/2, in other words W(x 1/2) = 1. The macrostate corresponding to the 
value 1/4, for example, would be realized by only a very tiny subset in [0, 1]. 

Entropy is extensive, but it was not originally realized as function on phase space. 
It was Boltzmann who achieved that. The entropy .” of a system in the macrostate 
Ma with phase space volume W(Ma) [= “number” of microstates Mi(Ma), which 
realize Ma] is determined by that “number” W(Ma). This is in fact what (4.23) 
says, and the entropy becomes in this way a function on the microstates — a function 
which is constant on the cell corresponding to the macrostate. 

The Boltzmann entropy is!* 


Sp (Mi(Ma)) = kg InW (Ma) [= HAE,V)) (4.30) 


'3 W(Ma) must actually be normalized by N! for the entropy to become extensive. Consider the 
phase space volume of the following macrostate of an ideal gas in a volume V. We have N particles 
and they are distributed over one-particle phase space cells a, i= 1,...,m, so that there are N; 
particles in cell oj with |oj;| = a, i= 1,...,m. How large is that phase space volume? There are 
N!/N,!N2!---Nn! possibilities for such distributions, and for each possibility the particles have 
volume @ to move in, whence 


N! 2 


We Nene 


(4.29) 


With Stirling’s formula (4.1) n! & n” and 
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physical space Q = phase space 


Mi1 


Ma1 


4g corresponds to 


| time evolution 


(exact) 


time evolution 
(coarse grained) 


Ma3 Ma4 


practically 
Vi corresponds to 
a(V) 


Fig. 4.9 The evolution to equilibrium depicted in physical space on the left and in phase space on 
the right: microscopically and macroscopically. Mil: The system starts with all gas molecules on 
the left side of the container. Mi2: The corresponding phase point, the microstate, is @. Mal shows 
the macrostate, of which Mil is one realization. Ma2 shows the set Ma(V) of all microstates 
which realize Mal. (Not to scale! It would not be visible on the proper scale.) The time evolution 
on phase space is denoted by @;. In Mi3(¢), the trajectory (®,@),,<[o,,) is depicted. In the course of 
time, the gas fills the whole volume. Ma3 shows the uniform density, as in equilibrium. Ma5 shows 
the time-evolved set ®,Ma(V) in which ®,@ lies. That set of microstates is perfectly mixed up 
with the equilibrium set and realizes for all practical purposes a macrostate, which is also realized 
by the huge set of equilibrium microstates Ma(V), i.e., for all practical purposes ®,Ma(V_) can be 
replaced by Ma(V). Nevertheless, the volume |Ma(V_)| of Ma(V) is very small and, by Liouville’s 
theorem, |Ma(V_)| = |®,Ma(V.)| < |Ma(V)| 


S N N! a 
| = In——_ = | Ninn— YN InN, 
kg aia in) MLN! ( » : ) 


i=1 


m m N; 
—) N,(InN; — InN Nil : 
(In nN) py ny 


i=l 


The thermodynamic entropy (as invented by Clausius) is extensive, which means it is additive. 
If we remove from a gas container a separating wall which separated two ideal gases, then the 
entropy of the “after system” will become the sum of the entropies of the “before systems”. Our 
definition does not obey that, but it does when we cancel the N-dependence, i.e., when we consider 
W :=W/N! and set S = kg InW. 
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Do we now understand why entropy increases? Yes, by simple combinatorics. 
Lifting macroscopic constraints like taking away the piston which held the gas 
molecules on the left side of the container in Fig. 4.1, the new macrostates will 
be those which correspond to larger and larger phase space volumes, i.e., for which 
the number of microstates which realize the macrostate becomes larger and larger. 
This is all depicted in Fig. 4.9. The number of microstates does not change under 
the Hamiltonian time evolution, by Liouville’s theorem. The gas extends over the 
whole container, yet the corresponding phase space region remains as small as the 
one it started in. 

So does that mean that the entropy does not increase then? Where did our reason- 
ing go wrong? Nowhere! We only have to accept that Boltzmann’s definition (4.30) 
of the entropy is more subtle than we imagined at first sight. We must always con- 
sider the macrostate first! The new macrostate Ma(V) has a very much larger phase 
space volume W(Ma(V)) , and the evolved microstate ®,q@ realizes this macrostate 
(for all practical purposes). There is a reason for the parenthetical addition of the 
phrase “for all practical purposes”, which will be addressed later. But it is nothing 
we should be concerned with. 

Here is a very simple mathematical example which may be helpful to understand 
the definition of entropy with regard to micro- and macrostates. Let Q = [0,1] and 
consider the time-one map Tx = 2x|moa1 (see Fig. 4.10). The T action is most easily 
understood when we write x € [0, 1] in the binary representation. Then 


In other words, 
rp(Tx) = reyi(x) . (4.31) 


The map is obviously not invertible (so the n-fold composition of the map defines 
irreversible dynamics, different from Hamiltonian dynamics). We look again at the 
macro-variable (4.28) 


12 
p=-> 7%: [0,1] [0,1], 
as 


and we shall show in Theorem 4.2 that for n large this is typically close to 1/2. We 
observe that a microstate x € [0, 1/2”) corresponds to the macrostate Ma given by 
p = 0. Hence all microstates realizing Ma with p = 0 make up the set with r, = 0 
fork = 1,2,... ,n. Therefore 


S = kg lnW (Ma(p = 0)) = kg In| (0, 1/2")| = —kgnIn2. 
Now for 7 very large x € [0,1/2") is atypical for a number in [0,1], since the x 


starts with a lot of zeros. But for a conditionally typical x € [0, 1/2”), i-e., typicality 
defined with respect to the conditional measure 2”|A|, A C [0, 1/2”), we should see, 
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Fig. 4.10 The graph of the Bernoulli map 7x = 2x|moa; and iterations of an initial point x9. Note 
that T is not invertible 


from some large place in the binary expansion onwards, that zeros and ones occur 
in an irregular but lawlike manner, in the sense of the law of large numbers. In fact 
(we only look at discrete times, but that does no harm) with 


ri)o,5,) =Terer--of (0,2) =[0,1), J2n, 


we have that, typically with respect to the conditional measure, 


: 1 F 12 niarge | ; 
pj(x) = p(T’x) = x > n(T?x) = a > Tkej(X) & a? J 2n, 
k=1 


as we shall show in Theorem 4.2. But Ma corresponding to p = 1/2 has almost all 
of [0, 1] as its phase cell, and thus S = kg In(|[0, 1)|) = 0, which is of course much 
larger than the non-equilibrium value —kgn1n2. That is the essence of the story. 

For Boltzmann, the story did not end so happily. He could not cheat on the re- 
versibility of the fundamental motion as we did. His map 7; is the Hamiltonian flow, 
which is time reversible, and this leads to extra complications which we have swept 
away by appeal to our catch phrase “for all practical purposes”. Does the fact that 
the fundamental motion is time reversible change the qualitative picture we have 
shown in Fig. 4.9? Some physicists have answered that it does! On the other hand 
one can hardly fight the intuitive feeling that the picture is right. After all, there are 
typically more possibilities for the gas molecules to distribute themselves equally 
over the whole volume (i.e., phase space, see Fig. 4.1). 

But the debate was nevertheless about this question: How can time reversible dy- 
namics give rise to phenomenologically irreversible dynamics? This is a bit like a 
debate about whether something tastes bitter, when there is nothing fundamentally 
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like “bitterness” in nature. But the criticism was in fact more technically oriented, 
with the question: How can one mathematically rigorously derive a time-irreversible 
equation of motion from time-reversible equations? And that is indeed what Boltz- 
mann claimed to have achieved with his famous equation,!* and the resulting H- 
theorem (see below). 

Boltzmann met the criticism head on and rectified his assertions, where nec- 
essary, without sacrificing any of his ideas. The criticism was this. In the above 
mathematical example, under the T map, the conditionally typical point x eventu- 
ally becomes unconditionally typical, since the zeros in front are all cut away after 
enough applications of 7. Nothing in the point 7/x recalls the special set x once lay 
in. We cannot turn the wheel back, nor the time for that matter. The map is not time 
reversible. In the gas example of Fig. 4.9, for all practical purposes, the macrostate 
does not distinguish the special phase point ®,@ from typical phase points, but it 
is nevertheless special! To see that, all we need to do is to reverse all velocities 
in ®,@ : ®,(x,p) — ®,(x,—p). What does the gas then do? It returns peacefully, 
and without need for a piston to push it, into the left half of the container, which 
is atypical behavior at its very best. Reversing all velocities is like reversing time! 
Therefore, in a closed system, the gas carries its past history around with it all the 
time. For all practical purposes, we can forget that information and replace the phase 
point by a typical one, because practically speaking it is impossible to reverse the 
velocities of all the gas molecules. 

But so what? This proves that there are in fact “bad initial conditions”, for exam- 
ple, ®,(x,—p), for which Fig. 4.9 is not right. This was spotted by Josef Loschmidt, 
a friend and colleague of Boltzmann, and this “Umkehreinwand” (reversibility ob- 
jection) led Boltzmann to recognize that his famous H-theorem (see below), which 
in its first publication claimed irreversible behavior for all initial conditions, was 
only true for typical initial conditions. Because, as Boltzmann immediately re- 
sponded, these bad initial conditions are really very special, more atypical than nec- 
essary. They are not conditionally typical, conditioned on macroscopic constraints, 
which for example can be achieved by a piston which holds all gas molecules in the 
left half of a container. The time-irreversible Boltzmann equation (on which the H- 
theorem is based) holds for conditionally typical initial conditions and governs the 
time evolution of the conditional typical value of the empirical density [analogous 
to (4.27)]: 


fap) © relative number of particles with 
4,P) * phase space coordinates around (q,p) ’ 


where the approximation holds in the sense of the law of large numbers. The 
conditional typical value is time dependent since, after removal of the macro- 
scopic constraint, the phase points typically wander into the large phase space 
volume defining equilibrium. Boltzmann showed in his H-theorem that H(t) = 


'4 For mathematically rigorous derivations of Boltzmann’s equation in the sense of mathematical 
physics, the proof by Lanford is recommended [6]. See also [7]. 


Mathematical Physics 


88 4 Chance 


J f(t,q,p) In f(t,q,p)d°qd? p increases (on average) as time goes by, toward an 
equilibrium value. 

But the mathematician Ernst Zermelo went on to criticize Boltzmann’s view of 
atomism, which is basic to Boltzmann’s understanding of irreversibility, on the basis 
of a little theorem proved by Poincaré, known as the “Wiederkehreinwand” (recur- 
rence objection). Poincaré’s recurrence theorem is a very simple result for dynami- 
cal systems with a phase space of finite measure: 


Theorem 4.1. Poincaré’s Recurrence Theorem 


Let (Q,A(Q),®,P) be a dynamical system, that is, Q is a phase space, B(Q) 
the o-algebra,!> ® : Q — Q a measurable" (time-one) map, and FP. a stationary 
measure, i.e., P(®~'A) = P(A), A € A(Q). Assume the finite measure condition 
P(Q) < c and let M € &(Q). Then for almost all @ € M (i.e., except a set of 
P-measure zero), the orbit (®"@)ycy visits M infinitely often. 


For simplicity, we prove the theorem for invertible ® (as it was originally done for 
Hamiltonian flows) and we omit the proof of the assertion that the recurrence occurs 
infinitely often. Let N be the bad set,!” i.e., the set of points in M which never return 
to M (and thus never to V). Then 


@"(N)OM=90 foralln>1, 
and for n > k, 
©"(N) nN @(N) = oF (ow) nn) =0. 


Therefore the measures of the sets add and, with the finite measure condition and 
stationarity, which for invertible maps is equivalent to P(@A) = P(A), we obtain 


coo > P (U on) = > P(®"(N)) a > P(N), 
n=0 n=0 n=0 
hence P(N) = 0. 

There is no way to escape this fact. If the gas obeys the Hamiltonian law of 
motion then, when the gas started in the left half of the container, it will eventually 
return to the left half of the container — if the system remains isolated. So when the 
gas has expanded to fill the whole volume, it will not stay like that forever, but will 
return to the left half once in a while — for sure. Well, if one waits long enough, an 
equilibrium fluctuation will produce that too, and this was also Boltzmann’s answer: 
If only one could live long enough to see that! 

So what is the point here? The point is a mathematical technicality that must 
be taken into account when one tries to derive long-time irreversible behavior from 


'5 See footnote 21 in Sect. 4.3. 
16 See (4.39). 
'7 That N is measurable, i.e., N € BQ), is left as an exercise. 
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time-reversible dynamics — one must ensure that the Poincaré recurrence time is 
infinite. But how? By taking a limit of an infinitely large system, thereby violating 
the finite measure assumption. Think of our simple example of a typical x € [0, 1] 
and the macro-variable (4.28) p which depends on n. For finite n, even for typical 
x € [0,1], p(T/x) will fluctuate to value zero for j large enough. There will always 
be a sequence of n consecutive zeros somewhere. But that can no longer happen if 
we first take n — oo. 

Boltzmann even gave a time scale on which a recurrence in the ideal gas situation 
is to be expected. That time is, not surprisingly, ridiculously long, longer than the 
cosmic time we imagine as the lifetime of our universe. Boltzmann’s estimate was 
based on a new notion which he introduced and, which became a fashionable field 
in mathematics: ergodic theory.'® According to Boltzmann, one should envision 
that a system runs over the whole of phase space, and that the time it spends in 
cells of macrostates is proportional to the cell size. The famous formula is, for the 
Hamiltonian system, 


1 
lim — 


too f 


| 11 (®,@) ds = | pr(@)do = Pe(A), (4.32) 
0 A 


where the left-hand side is the fraction of time the system spends in A and the right- 
hand side is the microcanonical measure of A. This looks reasonable. Since the ratio 
of phase space volumes of equilibrium values to non-equilibrium values is huge, 
so are the time ratios. To recall the sizes, compare the numbers for equidistribution 
with a fluctuation of remarkable size in a gas of n particles, viz., 


(2) is (sc sp) : 


where n is something like 1074. From (4.1), it is an easy exercise to show that the ra- 


tio of the left- and right-hand numbers is ~ exp(2ne*), i.e., the ratio of sojourn times 
in equilibrium to non-equilibrium ~ exp (10*e7) . Whatever microscopic time scale 
one uses, the resulting time spent in equilibrium is ridiculously large. 


4.2.2 Our Atypical Universe 


Boltzmann held the following picture of our universe. The part of the universe which 
lies behind the horizon of what is accessible to us right now is in equilibrium — an 
equilibrium sea. We — that is we, our solar system, the galaxies we see, the past of 
our universe as far as we can witness it — merely mirror a possibly giant equilibrium 
fluctuation — a non-equilibrium island in an equilibrium sea. That sounds reasonable. 
It does suggest that the “initial condition” of our universe is after all typical. No need 
to worry any further about it. 


18 «Freoden” was Boltzmann’s name for the microcanonical measure. 
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Nowadays we look much further into the universe than scientists did in Boltz- 
mann’s day, and we have witnesses of a much older past than in Boltzmann’s day, 
but we do not see any glimpse of global equilibrium surrounding us at the new “vis- 
ible” astronomical distances, and we find no glimpse of equilibrium in the “new” far 
distant past. What does that tell us? Does it tell us that the fluctuation turned out to 
be more violent than one had judged in Boltzmann’s day? No, that would violate the 
typicality idea: no more atypical than necessary is Boltzmann’s adage. Thus a ques- 
tion arises: How giant is the fluctuation? Is there a highly ordered (lower entropy) 
past of my present? But my present is already atypical and the moral of typicality 
is: no more atypical than necessary! Then the past of my present should look less 
atypical, i.e., the entropy of the universe should increase from NOW towards the 
past and the future. The NOW is the deepest point of the fluctuation! 

Actually, we do not behave as if we believed in a fluctuation. Feynman calls 
Boltzmann’s fluctuation hypothesis ridiculous [8, p. 129]. We believe (and Boltz- 
mann did too) that there is a very special past, and in believing that, we are in fact 
reconstructing a logical past which makes sense and which proves that we are right 
in our belief. Paleontologists set out to find the missing link between ape and man 
and they eventually did — the Java man, or Homo erectus. Schliemann believed that 
Homer’s Iliad had roots in a truly existing past, which was not just the words of the 
Iliad, and he found the remains of Troy. We probe the celestial depth by sending 
out the Voyager spacecraft with a golden disc containing Chuck Berry’s Johnny B. 
Goode into space, hoping that, way out there, invisible and unknown to us, some- 
thing will appreciate good old earthly sounds. 

It is ridiculous to readjust a fluctuation after learning that non-equilibrium ex- 
tended further than previously assumed. It does make sense to readjust the atypical- 
ity set of initial conditions if we become aware of further macroscopic constraints. 
That is actually how we think and behave. We are convinced that our initial condi- 
tion is a very special one. We believe that the initial condition of the universe was 
carefully selected. So carefully, that the measure of that in numbers is way beyond 
our human scales (see, e.g., [9] for an estimate of the size of the special set, and for a 
nice drawing of a goddess carefully aiming with a sharp needle at the chosen initial 
configuration of our universe). What is the physics behind the selection? We do not 
know (see, e.g., [10] for speculations on that). That ignorance of ours deserves to be 
called an open problem: the problem of irreversibility. 


4.2.3 Ergodicity and Mixing 


Why should we need to know anything about ergodicity and mixing, if we are not 
interested in doing specific technical work in the theory of dynamical systems? In 
fact, it is not at all necessary. But since the time of Gibbs, these notions have been 
floating around as if they were fundamentally important for understanding the role 
of chance in physics, and in particular for justifying the use of equilibrium ensem- 
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bles to compute the average values of thermodynamic quantities. To avoid those 
misconceptions, we shall address these notions briefly. 

As we already said, ergodicity was introduced by Boltzmann for the purpose of 
computing times of sojourn from the formula (4.32). Boltzmann’s idea here was that 
a phase point moves around and that its trajectory will eventually cover the energy 
surface in a dense manner. This allowed him to estimate the enormously long times 
(more or less the Poincaré recurrence time) a system stays in the overwhelmingly 
large equilibrium regions, i.e., the regions of typicality, in order to address people’s 
worries about early returns. And that is all ergodicity is good for, apart from some 
other technical niceties. 

A misconception one often encounters is that ergodicity justifies the use of equi- 
librium distributions for averaging thermodynamic quantities, because the measure- 
ment of thermodynamic quantities takes time, time enough for the phase point to 
wander about the energy surface, so that the “time average equals the ensemble av- 
erage”. But Boltzmann’s understanding is obviously quite the opposite. Given the 
energy of a system, the overwhelming majority of its phase points on the energy 
surface are typical, i.e., equilibrium phase points, all of which look macroscopically 
more or less the same. Therefore, the value of any thermodynamic quantity is for 
all practical purposes constant on that energy surface. Averaging over the energy 
surface will merely reproduce that constant value, regardless of whether or not the 
system is ergodic. 

The mathematical definition of ergodicity focusses on the dense covering of tra- 
jectories on the energy surface or more generally on phase space. Equation (4.32) is 
then a theorem when ergodicity holds. It is noteworthy that ergodicity is equivalent 
to the law of large numbers, and that independence is stronger than mixing, which 
is stronger than ergodicity. Indeed (4.32) is technically nothing but the law of large 
numbers, which we shall show later under independence conditions. 


Definition 4.1. A dynamical system (Q ,B(Q),®, P) (see Theorem 4.1) for which 
P(Q) = 1 is said to be ergodic if the invariant sets have measure zero or one: 


AE AQ) wih @'A=A => P(A)=0 or P(A)=1. (4.33) 


Note that invariance of sets has been defined in accordance with the stationarity of 
the measure, namely in terms of the pre-image. If ® is bijective we can equivalently 
define ®A = A as an invariant set. 

Birkhoff’s ergodic theorem asserts that, for an ergodic system and for any (mea- 
surable!®) f 


1 N 
Jim 7 ¥ (G0) = / f(@)dP(@) for P-almost all @ . 
ee IN n=0 


It is interesting to note that the proof has two parts. In the difficult part, only sta- 
tionarity of the measure is required. From that alone, it follows that the limit on the 


19 See Sect. 4.3. 
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left-hand side exists for P-almost all @. The ergodic hypothesis enters only to deter- 
mine the limit. The set of @ for which the limit exists is clearly invariant. If the limit 
takes different values on that set, then the pre-images of the values likewise define 
invariant sets, but by ergodicity, invariant sets have measure zero or one, i.e., there 
can be only one pre-image up to sets of measure zero, whence 


1x 
jim — ¥ f(®,@) =c for P-almost all @ . 
ee N n=0 


Integrating this equation with respect to P, exchanging the limit on the left with the 
integration, we see by stationarity that c= { fdP(@). 

Ergodicity gives uniqueness of stationary measures in the following way. Sup- 
pose (Q,A(Q),®,P) and (Q,A(Q), ®,Q) are ergodic, i.e., in particular P and 
Q are stationary (like microcanonical and canonical ensembles). Then Q = P or Q 
is concentrated on the zero measure sets of P, and vice versa. To see this, suppose 
that Q # P, i.e., there exists a set B for which Q(B) ¥ P(B). Setting f(@) = 7p(@), 
the ergodic theorem implies that 


N 
fn yD Ie) = [ fo) ae(o) =P(B) for P-almost all o , 
and 
a. ee 
Hy 2 f(a) = [f) dQ(@) = Q(B) for Q-almost all o , 


but the right-hand sides are not equal. We must conclude that Q-almost all @ is a 
null set of P. The moral is that ergodic stationary measures are special among all 
stationary measures. 

Let us go a little further. Up until now we have talked about a process in time, 
i.e., we sample at successive discrete times. But we can also think of sampling in 
space, e.g., measuring the particle number in an ideal gas in various spatially sepa- 
rated cells in a gas container. Let f;(@) be the particle configuration in cell i. Instead 
of moving from cell to cell, we can also stay in one cell and shift the entire configu- 
ration of all gas molecules (where it is best to think of an infinitely extended gas, see 
Fig. 4.11). We obtain a shift ® : Q —=> Q (for simplicity, we consider cells which 
arise from dividing the x-axis, so that we need only shift in the x direction). If f 
denotes the configuration in cell 0, then f(®;@) is the configuration in cell i. 

It is intuitively clear that Pg is shift invariant in this sense, since that is how 
we imagine a homogeneous state to look. This gives rise to a dynamical system, 
but this time with a spatial translation flow ®. However, our notions were already 
abstract, and everything transfers to spatial ergodicity and spatial ensembles. That 
abstraction is useful and pays off. Recall our discussion of the mixing partitions 
[compare (4.2)] on the Galton board. This characterizes stochastic independence 
and constitutes the condition under which the law of large numbers holds “trivially” 
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Fig. 4.11 Spatial shift 
—> 


Fig. 4.12 Rotation on the circle 


[compare (4.53)]. On the other hand, to see where ergodicity stands, we remark 
that the typical ergodic system is the following (see Fig. 4.12). Consider the circle 
Q={o@ € R’, || = 1} & (0,1) with w = (cos2zx, sin27x), and 


® : (0,1) — 0,1) 


Xt xX+ modi - 


Let P= A =|-| be the usual length. If a is rational, the dynamical system 


({0, 1), A((0,1)),®,2) 
is not ergodic (it is periodic), but if a is irrational, it is ergodic. This is most easily 
seen by using the equivalent definition of ergodicity: (Q,A(Q), ®,P) is ergodic 
if every measurable bounded invariant function f on Q is P-almost everywhere 
constant, 1.e., 


fo®=f => f=const. P-almost everywhere. 


Let us decompose f into a Fourier series 
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=> Cn or, 


with 
(fo ®)(x) = f((e+ 0) |mod1) 


>) Cae eit2a(xtor) _ y Cn elt2TO 4 in2ax ! oy Cn eit2ax 


n=—0o n=—0e n=—oo 


Therefore 
x 2% (1 _ ad einmx _ 
n=—oco 
and also 
cn( _ ag =0, 
whence c, = 0 or 1 — e!”?*% — 0, But the second option cannot hold for irrational a, 


unless n = 0. If @ is rational, for example, & = p/q, choose c, = | and all else null. 
f is then invariant and not constant. 

Now let us look once more at Fig. 4.9, where trajectories in phase space expand 
so that a tiny concentrated set gets distorted and convoluted under the time evolution. 
This looks so different from the rather dull rotation on circles. Gibbs introduced the 
idea that equilibrium is “natural” because of a mixing process which takes place 
in phase space. He used the analogy of a drop of ink in water, which is mixed 
up very quickly by stirring the water and thus turning the water quickly light blue 
throughout. Likewise — or so the analogy went in the minds of many people — any 
region in phase space, no matter how small in volume, will spread out under the 
dynamics after a suitably long time to points that fill out a region that is distorted 
and convoluted in such a way as to be spread more or less uniformly throughout 
the corresponding energy surface. Then, for any reasonable function f, the uniform 
average of f over the energy surface and over the time-evolved set are more or less 
the same. 

But the analogy is off target, since no one stirs in phase space. The system tra- 
jectories wander about and explore the relevant regimes of phase space for purely 
“entropic” reasons. The rough timescale over which trajectories mix all over phase 
space is again given by the Poincaré recurrence time. But since the phase space 
region of equilibrium is overwhelmingly large, a (typical) non-equilibrium phase 
point can enter rather quickly into the equilibrium region, so that the mixing time 
scale is typically reasonably short. That time scale governs the behavior of phase 
point functions like the empirical density. If one is interested in the time scale on 
which Mal goes to Ma3, then that is determined by (conditionally) typical trajec- 
tories. These can, for example, be read off from the Boltzmann equation for dilute 
gases, which describes the physically relevant transition to equilibrium. The concept 
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of mixing in the sense of Gibbs, which concerns the evolution of a non-equilibrium 
phase space distribution to let us say the microcanonical ensemble does not play any 
role in that. In chap. 5 we shall discuss a much simpler equation, the heat equation 
of Brownian motion, which is, however, exactly in the spirit of Boltzmann. For fur- 
ther elaboration, see [11, 12], and also the classic [5], as well as the delightful essay 
entitled The Pernicious Influence of Mathematics on Science, by Jacob Schwartz in 
[13]. 
For the sake of completeness, we now give the definition of mixing: 


Definition 4.2. (Q,A(Q),®,P) is called mixing if, for integrable functions f and 
& 


[ fro)go)ePo) "= | f(@)aP@) | s(@)aP(o). 434) 


For sets A (water in a glass) and B (B for blue ink), 


P(AN®,B) "—"  P(A)P(B) . (4.35) 
SS —S—_ 
part of blue proportional to 
in set A size of A 
All lightblue 


Let us see why mixing implies ergodicity. Let A be invariant, i.e., ®(A) = A. Then 
P(®,ANA) = P(ANA) = P(A), 
and by mixing, 
P(®,ANA) "—S P(A), 
that is, P(A) = P(A)*, so P(A) = 0 or P(A) = 1. 
Mixing also implies the “transition to equilibrium” of distributions. To see this, 


let (Q,A(Q), ®,P) be a mixing system and Q another measure, but which has a 
density with respect to P, say p(@) > 0, with 


Je dP(@) =1 
and 
[ 1(@)4Q(@) = [ £@)p(@)aP(o) (4.36) 


Imagine Q being a “non-equilibrium” distribution. Then 


[ fea) 4Q(0) "=" [ fw) AP), 


which comes from (4.34) and (4.36), and 
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4.3 Probability Theory 


Once (in Euclid’s day) axioms in mathematics were self-evident assertions. Now 
axioms are either merely useful definitions or they are the completion of a long his- 
tory of technical advances and thoughts. The axioms of probability theory are like 
that. The axioms, as they were formulated by Kolmogoroff in 1933, are the result 
of a long and highly technical development. Behind the axioms lies typicality and 
coarse-graining of points of a continuum. Underlying the axioms is Boltzmann’s 
kinetic gas theory, Lebesgue’s construction of a measure on subsets of the real num- 
bers, the Rademacher functions, and finally Hilbert’s quest for an axiomatization of 
the technical use of probability, which Hilbert — intrigued by Boltzmann’s advances 
in kinetic gas theory — formulated in the year 1900 in his famous list of 23 problems 
as the sixth problem. 

But of course none of this history is spelt out in the axioms. The axioms formulate 
the structure within which typicality arguments and coarse-graining can be phrased. 
Seen this way the prototype of a measure of typicality is the Lebesgue measure. 
This chapter is very much influenced by the writings of Mark Kac, in particular by 
his marvellous booklet [14]. 


4.3.1 Lebesgue Measure and Coarse-Graining 


The Lebesgue integral is a center of mass integration. Let A be the volume measure 
of subsets of the real numbers, i.e., the natural generalization of length of an interval. 
Then the Lebesgue integral is defined” by (see Fig. 4.13) 


[raxDnw=doa (r E *|) | (4.37) 
i i=0 


n n 


The main point is that the Lebesgue integral starts with a partition of the y-axis, and 
weights the values of f with the measure of the pre-image. Now, here is the catch. 
Suppose f is fluctuating wildly, beyond any stretch of the imagination. Then the pre- 
images of values of f can be awful sets — not at all obtainable in any constructive 
way from intervals. The question is: Do all subsets have a measure (a “length’”’), as 


20 First for non-negative functions, and then writing f = f* — f~ with f* positive, one extends 
the definition to arbitrary (measurable) functions. 
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Vee bx 


Fig. 4.13 In contrast to Riemann integration, where the x-axis is partitioned, in Lebesgue integra- 
tion the y-axis is partitioned into cells of length 1/n, and the values f; = i/n are weighted with the 
measure of the pre-images U; = f~![i/n, (i+1)/n] 


we intuitively think they would have? The answer is negative. There are sets which 
are not measurable. 

This is an easy exercise in analysis. It forces us (it forced the mathematicians 
of the early 20th century) to introduce a family of measurable sets. One should 
think of these sets as “constructible” from intervals, and that family is called a Borel 
algebra A(R). This is a o-algebra.”! So f must be such that the pre-images of the 
intervals on the y-axis are measurable. Such f are called measurable. It is easy to 
make the above idea of integration precise. It is a rather dull exercise to establish 
that all sets in A(R) are measurable. The important fact is in any case that the 
intuitive notion of size of a set can be realized on the Borel algebra and is called 
Lebesgue measure.” The Lebesgue integral is the “measure” integral obtained with 
the Lebesgue measure. 

The relevant structure for generalizing the notions of set size and integra- 
tion is (R, A(R),A), or for a finite measure space with the measure normalized, 
({0, 1], A({O, 1]),A). The generalization due to Kolmogoroff (the founder of ax- 
iomatic probability theory) is (Q,#A(Q),P), where in analogy with A((0, 1]) being 
generated by intervals on which the Lebesgue measure is “clear”, the o-algebra 
&(Q) is now generated by some family of subsets of some “arbitrary” phase space 
Q on which the measure P is somehow clear and can be extended (in analogy with 


21 A o-algebra /(Q) obeys: 


(i) QED, 
ji) Aca —=ACH, 
(iii) (Aj)ien CAH => UA; € A. 


>? There is a mathematical distinction between measurable and Borel-measurable. That distinction 
comes from the fact that the Lebesgue measure can live on a larger algebra than the Borel algebra. 
However, the enlargement consists of sets of Lebesgue measure zero. So forget about that. 
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the Lebesgue measure) to the whole o-algebra. But why is this triple the abstract 
skeleton of probability theory? 

Recall the coarse-graining description of our Galton board machine in Fig. 4.5. 
The “fundamental” typicality was defined by the canonical measure on the phase 
space Q of all the balls in the container. This measure is mapped by the coarse- 
graining function Y? : Q — {0,1,...,n} given by the sum Y?(@) = Yi_, X?(@) to 
a discrete measure P’ on the new phase space Q’ = {0,1,... ,n}, namely the image 
space of Y”. So let us forget about the “true” space and let us do our analysis on the 
image space (Q', 4(Q’),P’). Then the general idea is this. Let (Q, 4(Q),P) be 
a given probability space?* and suppose we are interested only in a coarse-grained 
description of Q by a coarse-graining function which is (very) many to one: 


X:Q—. 
Through X, the image space becomes a probability space 
(Q', B(Q'),P’) : 
with image measure P’ = Py, given by 
Py(A) = P(X7'(A)) . (4.38) 


But this requires, and it is the only requirement a coarse-graining function must 
fulfill,?* measurability: 


x!(M) € AQ), (4.39) 


for all M € &(Q’). This allows two things. The coarse-graining function can be 
integrated with P (by analogy with the Lebesgue integral), and it can transport the 
“fundamental” probability space to the image space (Q’, 4(Q'),P’) on which X, 
the coarse-graining function, is now the identity. 

Such coarse-graining functions are called random variables, a horrible and mis- 
leading terminology, because there is nothing random about them. In Euclidean ge- 
ometry the axioms, if one wishes to be axiomatic about it, are no big deal. They 
are obvious and readily implemented in proofs. But this is different in probability 
theory. The axioms are abstractions, ready to be used, but there is a price to pay. The 
price is that ontology sinks into oblivion. One easily forgets that there is a deeper 
theory of which we only describe the image. Coin-tossing is trivial when described 
by independent 1/2-1/2 probabilities. But how about the true physical process? 

The expectation value E(X) is 


23 That is just a name. 
24 Because non-measurable sets exist! 
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5 (X) = [x(@)P(do) 

«Bare ASI) 
ime) 
e i o'Px (da’) , 


with (4.38). We shall see later that the expectation value predicts the empirical mean 
(hence the name) as the image probability predicts the relative frequencies, because 
the latter is E(y4(x)) =Px(A), with v4 (x) the indicator function of the set A. 

X can be vector-valued: X = (X,... ,X,), with X;: Q —> Q,i=1,...,k. Then 
Px = P(x,.....x,) 18 a measure on 


Q'=0xQx...xQ, 


k times 


Pox ...x,)(Mi x... My) = P({x, 1m) AX51(Mp) Nn. 1X; '(Mi)}) 
(4.40) 


Px,,....X,) 18 called the joint distribution of the X;. Setting some of the M; = Q, so 
that only X;,,... ,X;, remain specified, one calls the resulting distribution Px,, es 
a |-point distribution and Px, the marginal distribution.”° 

Let us stick with ((0,1),A({0,1)),A) for the moment and consider the coarse- 
graining Rademacher functions r;(x), i.e., the numbers in the binary representa- 
tion of x € [0,1) (see Fig. 4.14). Take 7 : [0,1) — {0,1}, which is clearly coarse- 
graining, the image space being ({0,1}, A({0,1}),P,,), where A({0,1}) is the 
power set of {0,1} and 


(0) = ac 1(0)3) =a ([0,5)) =. 


Pal) =ateayy =a ([51)) = 5. 


This is the probability space of the single coin toss: head = 0, tail = | and “a priori 
probabilities” are each 1/2. Now go on with r(x),... ,7,(x),..-. Itis clear from the 
graphs that, for any choice (6), 52,63) € {0,1}?: 


P(r rosrs) (51 82, 53) = A (ry (81) Ary (52) N75 '(53)) , 


>5 Note that X;'(Q) =Q. 
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Fig. 4.14 The prototype of independent random variables as coarse-graining functions. 
Rademacher functions 7; given by the binary expansion of x € [0, 1): x = Xf, %2~* 


and 


A({xlri(a) = 81} {xlr2(a) = &} 0 {x1r3(2) = 53}) = \ . (3): 
P 


Likewise 
ee 
P rig eee stig) (Oi ++ 6i,) = (5) = ITP, (5i,) 5 (4.41) 
k=1 
and this property defines independence of the random variables r;,,... ,rj,,. In gen- 
eral, independent random variables X,,... ,X;, are defined by their joint distribution 


being a product: 


k 
Pox...) = []®x- 
i=1 


It follows that the expectation value of products of functions f; = g;(X;) is the prod- 
uct of the expectation values: 


(fi ++ fin) = E(fi,)---E(Sin) - (4.42) 


In particular, 


i TLfaP(o) = [] / f,AP(o) , (4.43) 
k=1 k=1 


a property which practically never holds. (Choose two functions and see whether 
the integral, e.g., the usual Riemann integral, over their product equals the product 
of the integrals.) One may recall also (4.34). 
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The probability space generated by (r),... ,7,) from ((0,1),A((0,1)),A) is the 
image space 


(10.0. 200.4" I. . 
i=l 


which is a perfect model for n-fold coin tossing. What is the “probability” that in 
n tosses one has tails exactly / times? On the image space, this is a simple combi- 
natorial exercise. There are (7) n-tuples with 1 at exactly / places and everywhere 
else zeros and each n-tuple has a priori probability 1/2”, and we add these to get the 
answer (adding the measure of disjoint sets to get the measure of the union set is 
intuitive and features as an axiom). But we can do this differently, taking the “funda- 
mental” space seriously. And in the hope of gaining a better grasp of the difference, 
we go through the tedious complication of computing from “first principles”. (This 
has been copied from [14].) 
The problem is to compute the measure of the set (heads=0, tails=1): 


=|: Y =i}, i AA) = fay (ex 


There is no need to be fancy here with the Lebesgue measure notation. Lebesgue 
integration is simply Riemann integration here. The trick is now to recognize that 


1 21 ee Hy 
Xa, (x) = xf dy ellZk=1%@) by. 


We change the order of integration, 


20 
A(Aj) = oe ah wf dr ellZe—1 bai f “ae [ axe” 


and by virtue of the very special property (4.41) in the form (4.43), this becomes 


1 an il - ; ir; (x) 
<— dye" [ dx” 
ah : Yl 0 


But 


Nl 


1 
| drei = Ly. =e 
0 2 
and we get the answer 


(A) = (5) aie dye” (14e)" = (5) (7) 


with or without combinatorics, i.e., combinatorics has to come in anyway. 
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4.3.2 The Law of Large Numbers 


What does it mean to say that the number 


1\"(n 
(2) (7) 
is the probability for exactly / heads in n tosses? According to Boltzmann it means 
that typically, in a large number N of trials of n coin tosses, one finds that the relative 
fraction of outcomes with exactly / heads in n tosses is close to the above number. 
In other words, it is the theoretical prediction of the empirical distribution. 

Let us go through a toy model for such a prediction concerning the probability of 
a single coin toss. The idea is to create an ensemble, tossing n coins simultaneously 
or tossing one coin n times. We use the Rademacher coarse-graining function r; as 
the map from the fundamental phase space to the outcome. We can even produce a 
dynamical picture by invoking the Bernoulli map T (see Fig. 4.10) with the prop- 
erty (4.31), which is in fact isomorphic to the shift map analogous to Fig. 4.11. To 
see this, we generalize to the infinite family (71,...,7n,...) of Rademacher func- 
tions. This is nice because it gives us the chance to explain the construction of more 
abstract probability spaces. 

We need to construct the image Borel algebra 4({0, 1}\) over the image space 
of infinite 0-1 sequences. We know the “infinite product” measure Pg on cylinder 
sets (playing the role of the intervals). These are sets specified by finitely many 
coordinates, while all other coordinates are free, like a cylinder extending freely 
over a base: 


Ziel Oisaeeie) 29 OCCA PO reg: S01, ging Oy caer) bi 
ith place i,th place 


This means that, in the infinite sequences, the ith to i,th places are determined and 
all other places are free. The measure of the cylinder set is defined by 


1\” 
Pg (Zire (Oks eee ,6n)) = Pui, eae see On) = (5) ’ (4.44) 


the product measure. The Borel algebra A({0,1}‘) is now generated by cylin- 
der sets Z_(...). The measure Pg, fixed on cylinder sets, extends to the algebra 
ZB({0,1}%) and is a product measure called Bernoulli measure. The shift ®, acts 
on the sequence space Q = {0,1}, shifting to the left and cutting away entries left 
of the zero. We see immediately on cylinder sets that the measure is stationary with 
respect to the shift 


Peg((®,)~ "Zi, ig (31+ 55n)) = Bea (Ziyat,..ix41(5i1y---5)) = (5) 


= Pw icity sea ,n)) 7 (4.45) 
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and we conclude that the measure is invariant. It comes as no surprise that we now 
have two isomorphic (via binary representation) dynamical systems. The one on 
the left we call fundamental, while the one on the right we think of as being the 
description on the coarse-grained level:7° 


(0.1), (0. 1)),7,2) ~ ({0, 1", A({0, 1}*),®,,P2) (4.46) 
and 


riz. = kj, th coordinate ofo € {0, 1% 
= ri(T'x) 
= k(®'@). (4.47) 
We can interpret the random variables 7; or k; as arising dynamically from T or from 
the shift ®,. 

Now take (4.46) as the fundamental model of a coin-tossing experiment, with 
r| (T* (x)) resulting from the kth toss. Suppose that in the first n tosses we obtained 
results 6,,... ,6,. What does the theory predict for the (n+ 1)th toss? Put differently, 
suppose we know 7 (T(x), i=1,...,n. What do we learn from that about x, so 
that we can be smart about our guess for the next result 7) oo (x)) ? Nothing! No 
matter how often we toss, in this theory, we remain absolutely ignorant about future 


results.’ In this sense a typical x € (0, 1] represents absolute uncertainty. 
Now here is a straightforward generalization of the foregoing: 


Definition 4.3. A sequence (X;)jcz (Z is more comfortable than N) of identically 
distributed independent random variables on (Q,B(Q),P) is called a Bernoulli 
sequence. The Xj are identically distributed if Py, = Px,. 


Every Bernoulli sequence represents a dynamical system. Let Xo € E. Construct 
(E%, BE”), ®,,Pg) , 
with the Bernoulli shift 
OE +k. 
a shift of one place to the left, and 


Py=[]Px,- 
iEZ 


On the new space X; becomes ki((€n)nez)» where e, € EF for all n € Z, or equiva- 
lently ko (0 ((én)ncz)) . With (X;)ien, then (f(X;)) ic 18 also a Bernoulli sequence. 


26 There is of course the one-to-one map of the binary expansion between the two phase spaces, 
since the phase space on the right contains infinitely extended sequences. 


27 This sounds almost like the irreducible probability in quantum mechanics, does it not? 
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Finally we can arrive at the empirical distribution o®, the object of interest 
our mathematical theory of probability should predict. The prediction will be of 
Boltzmann-type certainty, i.e., not certain but typical. The overwhelming majority 
of ws will agree upon what the empirical distribution will be. The empirical distri- 
bution or relative frequency is a random variable, a density for that matter. For the 
Rademacher functions, 


(N) ly 
Pemp(Y,x) = N s. 5 (re (x) —y) 
k=1 
N- 
8(n(T(x))-y) [by 4.47), 
or equivalently on the image space, 
(N) 1! k 
pinp(y.@) =, ¥ 8(ki(®K(@))-y) by 447). 
k=0 


If one feels uncomfortable with the 6-density, one can simply integrate a function 
of interest, e.g., the indicator function of a set (which yields the relative frequency 
of ending up in that set) or simply the identity, yielding the empirical mean of the 
random variable: 


pap (f,x) = | FO y) psmn( y,x)dy (4.48) 
1 N 
= LY rnb) 4.49 
k=1 
1 N-1 - 
aoe T (4.50) 
eel )) 
N-1 
= - ) _f(kx(@(@)) (4.51) 


This not only looks like, but actually is an ergodic mean [compare (4.32)], and 
since independence is stronger than ergodicity, we could conclude from Birkhoff’s 
theorem that 


plants) "=F E(eSY 4) =E(Fn)) = 5f0)+ 570, 452) 


for almost all x, i.e., the exceptional set will have Lebesgue measure zero, equiva- 
lently for Pg-almost all @. We rephrase this for f(y) = 741} (y), observing that ry 
takes only values zero or one, in the form: 


Theorem 4.2. For all € > 0, 
2( 2xe [0 1] 1 file 
im 
A 5 e383 NE 
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This theorem, which has a rather simple proof when one knows a bit of Lebesgue 
integration theory, says that the typical number in the interval [0,1) is a normal 
number, meaning that the binary expansion’® yields a 0-1 sequence with relative 
frequency 1/2 of 1s. It is the fundamental theorem about numbers in the contin- 
uum and the Lebesgue measure. It marks the beginning of probability theory as a 
purely mathematical discipline, and it is moreover the prototype of the prediction 
for empirical densities we can only hope for in physically relevant situations. 

We shall be content with a weaker assertion, namely where the limit is taken out- 
side the measure. In mathematics this is then called the weak law of large numbers, 
while Theorem 4.2 is referred to as the strong law of large numbers. We prove the 
weak law in the abstract setting: 


Theorem 4.3. Let (X;) jc be a Bernoulli sequence of identically distributed random 
variables X; on (Q,B(Q),P). Let E(X?) = E(X$) < . Then, for any € > 0, 


°({o 


Let us make a few remarks: 


N 
nL *@) ~ =(Xo) a) < ar EO) - #(Xo)"}. (4.53) 


(i) For simplicity of notation and without loss of generality, we have used X; 
instead of f(X;), where f is some function of interest. 

(ii) | Under the conditions assumed here (ergodicity is sufficient, we assume in- 
dependence), the theoretical prediction for the empirical distribution is given 
by the theoretical expectation of the empirical distribution. Here, because of 
“stationarity” or identical distribution, 


Le i 1 
D ( 3 x(o)) oa > E(%) = ai b(Xo) = E(Xo) - 
k=1 


(iii) Taking the limit N — 9, we obtain zero on the right-hand side. 
(iv) | What does this theorem say? The typical value for the empirical distribution 
is its theoretical expectation. The latter is commonly referred to as “probabil- 
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ity”. 


Rephrasing this in a formula, the theorem asserts that, for a Bernoulli sequence 


Xo0,---,XN> 
P({o| 


Proof. Start with the set whose measure is taken and rewrite it 


pamn(f,@) — / f(@)dPx, >e\) 0, (4.54) 


28 Tt is easy to see that this is true for every p-adic expansion. 
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N 
AY = 0 2S stay i(Xo)| > € (4.55) 
Ni 
i 2 : 
=§0||— > [Xe-E(%)]] >? >. (4.56) 
Nii 


Then observe that 


P(AN) =E(x4y) =f xay(@)aP(0). 


It is always good to have an integral over a function, because that can be estimated 
by a bound on the function. In this case, we use a trivial upper bound, which obvi- 
ously holds true in view of the above rewriting of the set AY : 


F vine ol] 


Xay(@) < I e2 
Hence, 
[+S pa —Been]] 
re = k 
P(AN)< il vel ; dP(o) 


N 
a ips [Xi — eo) | o>) [Xe — E(X,)] [Xj -—E(X))] | , 


and using the fact that the variables are identically distributed and independent 
(4.42), we continue 


I L = 
=5-7E ([xo- uo))) + page s (X; — E(X;))E(X; —E(X;)) 
sj 
= 53 [BO$) - EO%)’] 


We have also used the trivial fact that E(X —E(X)) =0. 
This computation is instructive and should be thoroughly mastered.”? 


29° The first inequality is Tchebychev’s. As trivial as the inequality is, it is very useful. 
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Chapter 5 
Brownian motion 


Brownian motion has been observed since the invention of the microscope (circa 
1650). The biologist Robert Brown (1773-1858) published systematic experimental 
studies of the erratic motion of pollen and of other microscopically visible grains 
swimming in drops of water. He called them primitive molecules, and it was unclear 
what made them move. In 1905, Albert Einstein predicted a diffusive motion of 
mesoscopic particles (i.e., macroscopically very small, but still visible through a 
microscope) immersed in a fluid, by adopting Boltzmann’s view of atomism. When 
the molecules in the fluid undergo heat motion and hit the mesoscopic particle in 
a random manner, they force the particle to move around erratically. He suggested 
that this might be the already known Brownian motion. 

We shall give Einstein’s argument, which has surprisingly little to do with the 
microscopic motion of atoms, and illustrates his genius once again. Independently 
of Einstein, Smoluchowski explained Brownian motion as arising from molecular 
collisions. His argument is stochastic and is based on the \/N “density fluctuations”, 
thereby disproving claims by his contemporaries that the molecular collisions, if 
happening at all, would average out to zero net effect. We shall also discuss this for 
a toy example. 

Brownian motion proved to the unbelievers of Boltzmann’s time that atoms do 
in fact exist. The quantitative prediction — diffusive motion — was experimentally 
verified by Perrin, who received the Nobel prize for that. Although based on the 
atomistic nature of matter, Brownian motion is not a quantum phenomenon. The 
reader should therefore ask why we need to know more about Brownian motion, 
when we have long since accepted the atomistic view of nature? The answer is that 
it is a physical phenomenon of the uttermost importance, because it bridges a gap 
between the microscopic and the macroscopic worlds, very much as Boltzmann’s 
equation does, except that Brownian motion is simpler. It is good to see the transi- 
tion from microscopic to macroscopic dynamics at work. It shows that macroscopic 
motion can look totally different (diffusive and irreversible) from microscopic mo- 
tion (ballistic and reversible). Finally, the heat equation which governs the probabil- 
ity of Brownian motion is Schrédinger’s equation with imaginary time, a technical 
feature that is sometimes put to use. 
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5.1 Einstein’s Argument 


This is taken from [1]. Imagine very many Brownian particles in a fluid. Very many, 
but not enough to form a crowd, i.e., a low density dilution. The Brownian particles 
themselves do not meet. The dilution can be treated as an ideal gas. For simplicity 
imagine that the density depends only on the x-coordinate. We thus have an en- 
semble of independent Brownian particles in the fluid, and according to the law of 
large numbers, the empirical density (Boltzmann’s view!) is given by the probability 
density, i.e., 


N(Adx,t) 


Pt) = — aa. 


where A is the area perpendicular to the x-axis and N is the number of Brownian 
particles in the volume Adx (see Fig. 5.1). 

The aim is to derive an equation for the probability density p, the determination 
of which has been transferred by the law of large numbers to that of a thermody- 
namic quantity, which is all Einstein uses. The microscopic picture ends here. So 
far, this is all there is in Boltzmann’s way of thinking. Now comes Einstein’s inno- 
vation. Let mg be the mass of the Brownian particle. Then p(x,t)mp = Vv(x,t) gives 
the gram-moles per unit volume. The mass flux in the x-direction is governed by the 
continuity equation 


av(x,t) aj 


ot Ox’ 

where j is the mass flux through A. We now make a phenomenological ansatz, to be 
justified shortly. We shall assume that the flux is proportional to the gradient of v 
and that the flux goes from high concentration to low concentration, bringing in the 
minus sign (the factor 1/2 is chosen for later convenience): 


lov 

=—--~D— 5.1 

ae (5.1) 
whence we obtain the desired equation 
Av(x,t) 1 _d%v 

! — ‘ ) 
ot 2° ax? 2) 

A KN 

dx 


Fig. 5.1 Formulating the problem of Brownian motion 
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The constant D needs to be determined. The determination of D, and in fact the 
“derivation” of (5.1), is where Einstein’s genius shows through. 

We first derive the osmotic force acting on a Brownian particle. Let p and p’ be 
the pressures exerted by the Brownian gas on the areas A and A’, respectively. The 
pressure difference is known as the osmotic pressure, and the osmotic force per unit 
volume (Adx) is (ignoring the time dependence to simplify the notation): 


(p—p')A _ dp 
Adx dx 
Now p obeys the gas law pAdx = n(x)RT, so p =7i(x)RT, with 7i(x) the number of 
moles per unit volume, which depends here on x. Hence 


! = 
(p= PA _ pn AMX). 
Adx Ox 


From this we deduce the osmotic force F per particle, observing that, in terms of 
Avogadro’s number Na, the total osmotic force (p — p’)A = Fn(x)Nq. Hence, 


i. 9) 
= FG) ox 


where we have used the definition of Boltzmann’s constant kg = R/Na. 

The osmotic force accelerates the Brownian particle but the particle also suffers 
friction Fr = —yv, where y is the friction coefficient and v the velocity. In equilib- 
rium the osmotic force equals the friction force, so that 


ol O7i(x) 
ar Oe 
and 
—i(x) kT dix) 
n(x)v= a ae 
and since 
Vv 
nv —, 
mB 
we obtain 
_ kpT OV(x) 
—v(x)v => y Ae é 


So what has been achieved? We have derived (5.1), i.e., we have determined D! The 
reason is simply that v(x)v = j, the mass flux through A. Hence, 


1 kpT 
pa Be. 


2 yy? 


Mathematical Physics 


142 5 Brownian motion 


the Einstein relation, one of the most famous formulas of kinetic theory. Here is 
one reading of it. The fluid molecules are responsible for pushing the Brownian 
particle, and at the same time, by the very same effect, namely collisions, they slow 
the particle down. Fluctuation (D, which is measurable!) and dissipation (y, which 
is measurable!) have one common source: the molecular structure of the fluid. It 
is no wonder then that they are related. The greatness of Einstein’s contribution is, 
however, that it is directly aimed at the determination of the diffusion constant in 
terms of thermodynamic (i.e., measurable) quantities. We give a realistic example 
for D below. 

Let us immediately jump to three dimensions and rephrase the result (5.2) as the 
diffusion equation for p(x,t;x9), where p(x,t;x)d°x is the probability of finding 
the particle within d*x around x at time t, when the particle is put into the fluid at 
the position x9 at time 0: 


0 | 
9 POX 3X0) = P52 P(% Xo) ’ (5.3) 
with 
p(x,0;x0) = 6(x—xo) . (5.4) 


Let us make some remarks on the equation and its solution. The initial condition 
(5.4) defines the fundamental solution of (5.3), which generates solutions for any 
initial distribution by integration: 


p(x.t) = f p(x.t:x0)p(x0)d°x0 


It also solves (5.3) and is the probability density for finding the particle at x at time ¢ 
when at time 0 the initial distribution of the position is p (xo), with [ p(xo)d°x9 = 1. 
Note that (5.3) is not time-reversal invariant. The equation describes the time- 
irreversible spreading of diffusive motion. Temperature does spread in that way 
through a medium. Hence the name heat equation. One solves (5.3) by Fourier 
decomposition, viz., 


p(k, t;x0) = (2n)9? fel p(x, 130) dx, 
and obtains 
P(k,t;x0) = e Pt/2 5k, 0;x0) 
= (20) 3/29 Pt/2——ikexo | (5.5) 


Hence, 


P(X,t5X0) = Oa) / elk Xe“ EDt/2_-ik-x0 G3 x (5.6) 
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a straightforward Gaussian integration: 


p(X,t3X0) = Gays i; elke) gD /2a55 


3/2 ge? 
= an (=) en |- xm) | [eres 


2 
= (2nDt)~*/? exp |- nee ; (5.7) 


where one should know that 


/ ey By= if ede i e dy ‘i ede = 13/2, (5.8) 


In one dimension, 


1 2 
P(x,5x0) = ——e HNP" (5.9) 


Imagine now the zigzag paths X(t), t € [0,cc) of the Brownian particle. The distri- 
bution p(x,t;Xq) is the image distribution of the coarse-graining variable X(t), the 
position of the Brownian particle at time t. The position is diffusive, which means 
that the expectation value of X*(t) is proportional to ¢ with proportionality given by 
D: 


1 
Vv 2nDt 


2 
= 3 : (fe?) [oreP Pray 
V2nDt 


= al LL, 2Dt) d fewrray 
V2nDt da 


= 3Dt. 


:( (X(#) -x0)”) = 


Jo = xo)2e7 &X0)"/2D1 g3 x 


el 


It was this diffusive behavior (X?(t)) ~ ¢ that Perrin verified. D can be determined 
(Stokes’ friction law for the friction of a ball in a fluid brings in the viscosity) and 
that can also be checked quantitatively. For a Brownian ball with radius a in a fluid 
with viscosity 17, 


kg 
~ 6mNa 


The friction can be measured (as can 7) and a), and T can be measured. Then since 
D can be determined from the measured variance of X(t), one can determine kg or 
Avogadro’s number [2]. 
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This is a masterpiece of theoretical thought, comparable to Boltzmann’s deriva- 
tion of the Boltzmann equation. While the latter is far more complicated, because it 
does not focus on the special Brownian particles but on the fluid itself, resulting in 
a nonlinear equation for the phase space density, the former allows a direct view of 
the molecular motion via the mesoscopic Brownian particle as mediator. This might 
explain why physicists were converted to atomism only after the work of Einstein 
and Smoluchowski had become known in 1905/1906. 


5.2 On Smoluchowski’s Microscopic Derivation 


The microscopic dynamics of the fluid molecules is time-reversal invariant. The mo- 
tion of the Brownian particle is governed by a time-irreversible equation. The idea 
that the Brownian particle is kicked from all sides by the fluid molecules suggests 
that the effects should balance out, whence no net motion should be visible. So why 
do the collisions have a nonzero net effect? Why is the motion diffusive? Why is 
the probability Gaussian? Smoluchowski gives answers based on microscopic prin- 
ciples. We do not present his work (see, for example, the collection [3]), but rather 
jump to a toy model which is in the same spirit, and quite adequate to understand the 
basic idea. It is a purely mathematical model of Brownian motion, which we phrase 
in physical terms, but it lacks essentially all of the physical ingredients which make 
Brownian motion (diffusion) a very difficult problem of mathematical physics when 
one starts from first principles and aims for rigor. 

The model is one-dimensional. All molecules move on the x-axis. At time 0, 
the system has the following state. It consists of infinitely many identical parti- 
cles {gi,vitiez, qi = il (ie., gi © 1Z), vy; © {—v,v}, where one should think of / 
as representing the mean distance between the molecules in a real fluid. The v; 
are independently drawn from {—v,v} with probability 1/2. The phase space is 
Q = {o|o = (il,+v)icz} with product probability Pg. The particles interact via 
elastic collisions. Since they all have equal masses, the worldline picture of the sys- 
tem is simply given by spacetime trajectories which cross each other as shown in 
Fig. 5.2. 

Focus on the particle starting at X = 0, i.e., color it brown and follow the brown 
trajectory X;, which is a zigzag path. We call it the Brownian particle. All other 
particles represent the molecules of the fluid. (X;);>0 is a coarse-graining function 
depending on @, and (X;(@)) +0 18 a stochastic process. 

What is non-equilibrium about that? The answer is: putting the Brownian particle 
at the origin at time zero. What is artificial in the model? 


e The dynamics is not given by differential equations. 

e The system is infinite, which must eventually be the case for sharp mathematical 
statements. Poincaré cycles must be “broken”. 

e The brown particle is only distinguished by color, not by its dynamics. It is in 
reality very much bigger and heavier. 

e The fluid particles should be distributed according to the canonical distribution. 
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Mt 


G3 45 q, q, q5 q 


Fig. 5.2. A microscopic Brownian path. Note that the initial positions of the “bath particles” are 
drawn randomly. They may be distributed according to a Poisson distribution, but in the text we 
handle the simpler case where the particles start on a lattice with spacing / 


Having said all this, the model nevertheless shows the characteristic macroscopic 
behavior we are after, without all the extreme complications a realistic model would 
introduce. 

So let us move on to X;(@), the brown path. It can be described as follows. Every 
At = 1/2v, there is a change of direction with probability 1/2, and 


Rw > Ae, (5.10) 
1<k<[t/At] 


where the Gauss bracket [a] means the greatest whole number < a, and Ax; € 
{—1/2,1/2} are independently identically distributed random variables assuming 
values Ax; = +//2 with probability 1/2. The approximation in (5.10) concerns the 
last time interval [{t/Ar]Ar,t] and is unimportant. This is simply a nuisance to carry 
around and we shall ignore it here. X; is thus a sum of independent random variables. 
The expectation value is 


i(X;) =E y Ax 


1<k<[t/Ar] 


= E(x) 
1<k<[t/Ar] 


(4) 1(=t) 14 
Atl 2-9 " 2D 
i; (5.11) 


I 


and its variance (which we should know by now, having proven the law of large 
numbers) is 
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(X;) = eee E47) 


(te . a) 
‘i 


we (5.12) 


Hence heuristically X(t) ~ \/t. This is what we are interested in. We do not care 
about single collisions on the molecular scale, but only about the macroscopic 
growth of brown’s position. We investigate that by scaling. We go to a larger scale in 
space and time. The interesting thing is to see how this is done. We go on the diffu- 
sive scale, i.e., when time gets enlarged by a factor | /e¢. Then when € gets small, the 
Brownian motion will vanish from sight. To maintain eye contact, we must rescale 
its position by a factor \/é. For € — 0, 


APSE as (5.13) 


One second for Xf corresponds to 1/€? seconds for the brown path, which means 
~1/ €? collisions have occurred. Moreover, 


t 
b((X/)°) = &° EX) _2) ~ eat. 


What is the distribution p* (x,t) of X? We get this via the Fourier transform of the 
distribution p*(k,t). This is so important that it has a name all of its own, viz., the 
een function: 


(on)p°(k,t) = fe (x,)dr= Ele). 


We see immediately why this is useful, because introducing (5.10) yields 
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(2)7/? p°(k,t) = fesn (i »y ean) 


n<[t/e2Ar 


= y ( Il a 
ns [t/e2A1] 


= [I Ele iam) 
n<[t/e2At 
1 
=- JJ 2 (1 ~ ited, — 592e%(s,)?-+o(€2)) 
n<[t/e2Ar 
122 3 
= ‘Ti 1-ske ZrO) ; 
n<t/e2Ar 


The third equality is obtained because, by independence, expectation and product 
can be interchanged, and the fourth equality is obtained by expanding the exponen- 
tial. In addition, we have used (5.11) and (5.12). Choosing a sequence of € so that 


t/e2At=NEN, 
272 N 
(27)?/? 6° (k,t) = i! ae o(5)| , 


N 8At N 


For N — ©, i.e., € — 0, this converges to 


272 
(2n)?/" p(k t) =exp as : t 
, SAt } ’ 


and observing that Ar = //2v, we obtain 


2 
p(k,t) = (2n)~7/* exp (-*r) 


Comparing with (5.5), we see that 


whence 


1 2 1 2 
x,t = e* /2Dt — —e * /ivt . 5.14 
P(x) V2nDt VJ ailvt ( ) 


Setting p = 1// as the fluid density and n = vp as the viscosity, this allows for a 
somewhat artificial macroscopic interpretation of the diffusion coefficient, namely 
D =v"? /2vp ~ kgT /1n. This is physically reasonable. The diffusion is greater when 
the fluid is less viscous and when the “temperature” increases. One should observe 
that the limit € — 0 means that we consider the microscopic motion in infinite time, 


Mathematical Physics 


118 5 Brownian motion 


whence the system must consist of infinitely many particles and extend infinitely in 
space to avoid cycles. It is only in the limit that irreversibility becomes “true”. 

What we have done mathematically is to “prove” what is known as the central 
limit theorem for independent variables.! 


5.3 Path Integration 


In (5.14), we only looked at the scaled position at time ¢. What can be said about 
the scaled process (XP )ref0,20)? We look at cylinder sets in the set of continuous 
functions on the interval [0,7], T < ce, denoted by C((0,7]). For this purpose we 
consider, at arbitrarily chosen times t),... ,fn, what we call gates Aj CR, i=1,... ,n, 
through which the functions are required to pass (see Fig. 5.3). This defines a set of 
functions, namely the cylinder set 


Zins (At -++ An) = {O(t) € C([0,7])|O(t1) € Ary... ,O(tn) € An, 
the measure of which is induced by the process XF : 
P (Zh... tu(Aty-++ An) = Pa({olX;(o) EAj,...,X£(@) € An}) _ (5.15) 


This can be read as follows. A trajectory which lies in Z,, 4, (A1,...,An) starts at 
zero and goes to X; € Aj, and from there to X2 € A», and so forth. We can view 
(5.14) as the probability that a trajectory goes from 0 to x, in time fy: 


1 eo /2Dt1 : 


p(,8;0) = ViaDa 


Consider now the process X/ starting in x, at time f,. It is intuitively clear that the 
distribution of X;- is 


(x9,t0541,11) se 
X2,12541,41) = —————e 
e 2nD(t2 —t1) 


x2—x1)°/2D(t2—11) - 


Putting this together, we expect an n-dimensional Gaussian distribution for (5.15) 
in the limit e — 0: 


' It is easily made mathematically rigorous and the assertion is as follows. Let (X;)jen be a se- 
quence of identically independently distributed random variables with E(X;) = 0, E(x?) = 0? £0 
on (Q, @(Q),P) and assume (for an easy proof) that E(|X |?) is finite. Then for S, = D7, X;, 


tim? ({ «| =5,(0) wal} = f oge*Pear 


This tells us how big fluctuations about the typical value are, which is the V/N law. 
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Fig. 5.3 Cylinder set 


Pid Ate. 0 gly) (5.16) 
xy (%2—m1)? (Xn —%n-1)” 
p |—~—— | exp | -~—- exp | — ————_ 
2Dt 2D(t2 — t1) 2D(th — tn—1) 
=/ dx | dz... ] dx os 
A, JAo An V2nDt} /2nD(tr —t) 20ND (tn —tn—1) 


The cylinder sets generate the Borel algebra 4(C[0,T]). 

One needs a bit of reflection to convince oneself that A(C[0,7T]) is the Borel 
algebra generated by open sets in the uniform topology |@|.. = sup,<j,7}|@(¢)|- 
The corresponding measure on function space C([0,7]) is then the so-called Wiener 
measure [lw, which is a Gaussian measure. The process (W;)re[0,<0)> the distribution 
of which is given by (5.16), is called a Wiener process or Brownian motion process. 
Having a measure on function space, we can integrate functions C([0,7T]) > R” 
(analogously to Lebesgue integration). This integration is sometimes referred to as 
functional integration, Feynman—Kac path integration, or simply path integration. 
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The Beginning of Quantum Theory 


We shall keep this short. Classical physics, by which we mean Newton’s laws and 
the Maxwell—Lorentz theory (see Chap. 2), fails to explain many atomistic phenom- 
ena, apart from the coarse-grained phenomena we discussed in previous chapters. 
This means that attempts to explain atomistic detail look artificial, and are no longer 
believable. In other words, straightforward application of classical physics yields 
descriptions that contradict experience in certain situations. In that sense, for exam- 
ple, Newtonian mechanics is superior to the theory of epicycles, which was invented 
to save Ptolemy’s geocentric astronomy, because the Newtonian explanation of the 
motion of heavenly bodies is straightforward and reduces to a single equation: New- 
ton’s equation. The failures of classical physics in the atomistic regime are mirrored 
in the names of certain effects, such as the anomalous Zeeman effect. This refers to 
the complicated splitting of spectral lines in magnetic fields that was found experi- 
mentally, while Lorentz’s classical analysis led to the “normal” Zeeman effect. 

A famous example leading to quantum mechanics is black body radiation. This 
is the radiation in a box, the walls of which are kept at a fixed temperature T, so that 
a thermodynamic equilibrium between radiation and walls is achieved. The distri- 
bution of the energy H of the radiation is then given by the canonical distribution 
(4.7): 


1 
sibs (-a7"') 
B 
Fr) . (6.1) 


In view of (2.44) with j4 = 0 (and c = 1), we have 


oe? 
(5a -4) at =o, (6.2) 
and with Fourier transformation, 
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A# (x,t) = a / ke i**A¥ (k,t) , (6.3) 
(27)3/2 
the Fourier modes are found from (6.2) to be 
At (k,t) = KA" (k,1) . (6.4) 


For k fixed, (6.4) is the harmonic oscillator equation with oscillator frequency 
o(k) = ||kl|, ie., 


g=-kq (=-079). (6.5) 


Therefore electromagnetic radiation consists of uncoupled harmonic oscillators. 
Every k-mode A“ (k,t) has frequency @ = ||k||. The functional dependence @(k) is 
called a dispersion relation. 

Let us now consider one oscillator (6.5) and determine its mean energy at tem- 
perature 7’: 


1 1 
H= =p as =moq 


2m 2 
and 
~BY Fdqd 
Bie) = se gdp 
feP4dgdp 
es Jo —B 1 Pi mer? dqdp 
dp Im? "2 
d woe 1 1 
Bop B > 


Hence every w-mode has the average energy kgT and the number of modes in 
[@,@ + da] is given by the number of k’ € [k,k + dk] which is 47k*dk, whence 
the number of modes is 47@7dq@. We thus obtain the average energy U (@,T) of 
radiation at temperature T : 


U(@,T)do ~ kgT odo , (6.6) 


which is called the Rayleigh-Jeans distribution. Experimentally, one finds some- 
thing else, something more reasonable for large w. What one finds is what the physi- 
cist Wien had predicted in (1896) when he was working on the Stefan—Boltzmann 
law, according to which there is a function f such that 


U(@,T)do ~ wf (=) oda. 


Wien argued that 


' j# in (2.44) can be thought of as generating oscillations. 


Mathematical Physics 


6 The Beginning of Quantum Theory 123 


f (=) = fe ho/beT 


so that 
U(@,T)do@ ~ hae "®/"87 @d@ ~— (Wien’s law). (6.7) 


The new constant ft = h/27 is to be determined by experiment. Planck then found 
an interpolation between Wien and Rayleigh—Jeans which in fact reproduced the 
observed E(@) in its totality. To his own discontent he had to assume in an ad hoc 
manner that the oscillators he hypothesised in the walls of the black body could only 
absorb and emit discrete energies nhw,n=0,1,2,.... 

The whole paper was very involved. But then Einstein came into the picture. He 
took the field — the radiation — as physically real, and assumed that it satisfied a dif- 
ferent law from what had previously been thought. But, with typical genius, he noted 
that we do not need to know what the law looks like. In fact it suffices to suppose 
that the energy of radiation of frequency @ can only be an integer multiple of ha. 
Then everything is suddenly trivial. We take the usual thermodynamical statistics, 
but with a different dynamics of which we only know E,,(@) = nha. Then, 


d = d = 
E(@) = —— In ¥ e8 = ——_ In ¥ eBuro (6.8) 

pee apd; 
d 1 hae Pho 

~ dp 7 ( 1— =a) ~ [—e Bho ’ oe) 

which gives Planck’s celebrated radiation formula 
hao —ho/kgT 
U(@,T)do ~~ ___ "do . (6.10) 


_ efiw/kgT 


For kgT > hq (high temperature, small frequency), we obtain approximately 


ho 
When inserted in (6.10), this yields (6.6). And for kgT < hw, we have 


E(@) = hoe @/keT : 


from which we get Wien’s law (6.7). 

The new description of the electromagnetic field in terms of energy quanta — pho- 
tons of energy hq — also provided a straightforward explanation of the photoelectric 
effect, discovered by Hertz in 1880. Einstein did that in the second of his funda- 
mental 1905 works, and received the Nobel prize for it. A strange thing about this 
is that neither Einstein (and he did not pursue the question any further, but moved 
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on to gravitation) nor ourselves know today what “photons” really are. Are they 
particles? Are they extended objects? Are they anything at all? 

A similar ansatz was given by Bohr in 1913 for the angular momentum of an 
electron moving around the nucleus. He assumed that 


L=nh, nen. (6.11) 


The Bohr—Sommerfeld quantization condition for the action of periodic orbits was 
similar: 


f pag =nh. (6.12) 


This yields the spectral lines of hydrogen and hydrogen-like atoms with surprising 
precision and simplicity. 

This is all well known and has been retold often enough, so let us now move 
on to Louis-Victor de Broglie’s ideas back in 1923. The notion of plane wave e**, 
k € R? is clear. A wave packet is a superposition of plane waves centered around a 
value ko: 


f(x) = [el fag WK, (6.13) 


with fx, (Kk) centered around Ko, as in 


a Loe ey 5 22 
folk) = Garaagse OOO" (6.14) 


Then 
F(x) = ello%e 7 07/2, 


This means that f is concentrated on a region of size 1/o. Nothing special about 
that. The wavy character appears when we consider the time evolution of waves. 
What is special is that the frequency @ is a function of k. The wave character lies 
in the dispersion relation @(k), a relation basic to all wave phenomena. It regu- 
lates the spreading of wave packets which group around different wavelengths, a 
phenomenon of great importance for Bohmian mechanics. 

The dispersion relation contains a way of relating wave phenomena to Newtonian 
notions like momentum and energy. This will be further elaborated in Sect. 9.4. 
The dispersion relation of electromagnetic waves in vacuum is @* = c?k’, and this 
means that packets of light waves around different wavelengths do not separate, 1.e., 
they do not spread. Electromagnetic waves in matter behave differently, for one has 
phenomenologically different dispersion relations. 

A characteristic property of waves is the ability to interfere, i.e., the superposition 
of waves creates new wavy pictures with new amplitudes and new nodal areas. This 
is most relevant when waves in a wave packet separate. 
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The time evolution of (6.13) is 
f (x,t) = / ellkx—o(k)] A (k)d°k . (6.15) 


We focus on (x,t) values for which f(x,r) is large. To this end we expand the phase 
S(k) in the integral of (6.15) around ko: 


S(k) = k-x— o(k)t 
= ko-x @(Ko)t + (k ko)-x (k ko)-Va@(Ko)t 


[(k —Ko)-@"(ko)(k—ko)|t+---, 


Nie 


with the Hessian (ko) (a matrix of second derivatives). S(k) varies most in the 
linear k term, so we choose x and ¢ in such a way that this term vanishes. Then 
for such values (x,t), there will be little “destructive interference” and f(x’,t’) will 
have maximal amplitude around (x,t). This idea underlies the so-called stationary 
phase argument, and a great many quantum phenomena are explained by it. It will 
appear over and over again, for example in (9.18) and Remarks 15.8 and 15.9. 

One defines the group velocity of the wave packet? f(x,t) by 


Vz := — = =_(ko), (6.16) 


so that f(V t,t) always has maximal amplitude. For stationary points (x,t), the 
phase S(k) has no linear k term, and introducing this in (6.15) yields 


f (Xt)  elKox Oko)" Fi exp 1-3 [(k — ko) - @” (ko) (kK — Ko) 7 Fics (kK) ak 
(6.17) 


In view of (6.14) we see that the k width has changed. For simplicity replace @’’ (ko) 
by yE3, where E3 is the 3 x 3 identity matrix, and note that the k width becomes 


a 1 
o7+ity Yo?4+0r/- 


The position width [of (6.13) with (6.14)] is the reciprocal, i.e., 


Vo72+10ePyr roy, fort. (6.18) 


This is referred to as the spreading of the wave packet. It spreads faster for smaller 
initial position widths (o large). We merely mention this in passing. 


? The group velocity is the “phenomenological” velocity of a wave. It is determined by the motion 
of the maximal amplitude which is influenced by interference. The phase velocity of an almost 
plane wave is determined by the (x,t) pairs of constant phase, ko-x — @(Ko)t = 0. 
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Let us go on with de Broglie’s idea. In spacetime, electromagnetic radiation 
moves along light cones. That is to say, the trajectories of what we would like to 
call photons, if they exist, should lie on light cones, where ds? = 0. Thus proper 
time is no good for parameterizing photon trajectories. To get a consistent picture of 
these photons as particles in some sense, the energy-momentum relation must read 
E? /c? — p? =0, i.e., the rest mass of photons must be zero. A photon with frequency 
@ has E = ho and hence h*@?/c? — p* = 0. 

Since the dispersion relation for electromagnetic waves is @ = 
to set p = hk. Hence the energy-momentum four-vector for photons would be 


a) 


Let us connect with Hamilton’s idea of reading Newtonian mechanics by analogy 
with the geometric optics of a wave theory, and Einstein’s idea of introducing pho- 
tons into a wave theory. Then we arrive at de Broglie’s 1923 idea of attempting once 
again to unite wave and particle, which gave the de Broglie matter waves. 

It should be noted that this early attempt differed from Hamilton’s. De Broglie 
had in mind one wave per particle. The basic idea is this. Insert EF = h@ and p = hk 
into the energy-momentum relation for a particle of rest mass m, viz., 


(6.19) 


This is almost too good to be true. Bohr’s quantization condition emerges at once, 
requiring the “electron wave” circling the nucleus to be a standing wave. Then with 
A = 27/k as wavelength, put 


2 27h 
Qnr=nd =n =n, 
k P 


which yields 
L=rp=nh. 


But a particle should be associated with a wave packet around some ko. The par- 
ticle’s velocity should be the group velocity vg given by (6.16). Is that consistent? 
The momentum was already fixed to be p = fiko, while on the other hand 


hiky = p = iV, = m= = (ki) (6.20) 


ok 
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should hold. But (6.19) is already fixed. It is rather surprising that (6.19) solves the 

equation (6.20). According to (6.19), we get 
no ( ; = mine _ me? ee! mc? _ 

Mok” @(ko) folk) ° ie? ©" 


Equation (6.19) is the relativistic dispersion relation for a particle of mass m. Making 
the Newtonian approximation yields the Galilean dispersion relation (dropping the 
constant phase) 


o(k) ==". (6.21) 


This is the key to Schrédinger’s equation. 

The careful reader will have noticed that we did not spell out exactly how wave 
and particle were brought together. Louis de Broglie had various ideas about that. 
He tried one idea — after the advent of Schrédinger’s wave equation, which he subse- 
quently used — and wrote the equations of Bohmian mechanics on the blackboard at 
the famous fifth Solvay conference in 1927. His colleagues did not like it and gave 
him a hard time, especially Wolfgang Pauli, who actually reformulated de Broglie’s 
suggestion as in (7.16). Discouraged by the general animosity of his colleagues, he 
did not pursue his idea any further, and it was not until 1952 that David Bohm [1] 
published the equations again without being aware of the Solvay history (more de- 
tails of this history can be found in [2] and [3]). However, Bohm did not only write 
down the equations, which incidentally are quite obvious. He also analyzed the new 
theory, explaining non-relativistic quantum theory. All mysteries evaporated under 
Bohm’s analysis, except maybe one: Where does quantum randomness come from? 
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Chapter 7 
Schrédinger’s Equation 


Over the period 1925-1926, Werner Heisenberg, Max Born, Pasqual Jordan, Paul 
Dirac and Erwin Schrédinger discovered “modern” quantum theory almost si- 
multaneously. Schrodinger’s first steps were rather different from Heisenberg’s. 
Schrédinger turned de Broglie’s 1923 idea of matter waves into a mathematical 
theory connecting them to the eigenvalue problem of partial differential operators 
— a prospering topic in mathematical physics at the time: eigenmodes and discrete 
eigenvalues fitted well with the discreteness of spectral lines. Schrodinger found the 
partial differential equation which governed all that. 


7.1 The Equation 


Consider the de Broglie wave packet 


v(q.t) = f cya Way 
Introducing the dispersion relation (6.21), viz., 
w(k) = hk’ /2m, (7.1) 


we realize that differentiation with respect to time can also be achieved by differen- 
tiating with respect to position: 


ow h o* 
i— =-—-{sYV. 
Ot 2m oq 
In a manner of speaking this equation governs the freely moving wave packet of a 
particle of “mass” m. 


Now according to de Broglie iw = E, which suggests an analogy with the Hamil- 
tonian of a freely moving particle: 
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We may therefore view 
he @2 
a 
as the analogue of H(p). Extending this to the general form of the Hamiltonian 
2 


Dp 
H-f-4v 
2 (q) 3 


it seems plausible to add V(q) to — (7 /2m) (0?/dq?), leading to 


e) ho? 
ihi— t) = |—— > +V t). 
ing-wlat) = [3 Sa +¥(a)| Wea 
The nice thing about the analogy is that the Hamiltonian is a function on phase 
space. Therefore, extending the analogy to an N-particle system, we arrive at 


o) a a 
ii— th= -——.=>51+V t). 7.2 
ins (at) » mms aq? + VO) Vat) (7.2) 
Here q = (q1,--. ,qy), and the wave function y for an N-particle system comes out 
“naturally” as a function of configuration and time, i.e., w(q,t) = W(qi,---,qQw,t)- 


Equation (7.2) is the celebrated Schrédinger wave equation for an N-particle system. 

Schrédinger’s way of arriving at his equation was different from ours, but by 
no means more straightforward. The differential operator (including the potential 
as a multiplication operator) on the right-hand side of (7.2) is commonly called the 
Hamilton operator. The main reason for this name is that it is built from the old 
Hamilton function, but with p; replaced by the momentum operator 


0 
Odi , 


D; = —ih 


while q; becomes the position multiplication operator q,; := qj: multiply by q;. For 
some, the essence of quantum physics is to “put hats” on classical observables to 
turn them into operators. 

Why is it nice that the wave function is a function on configuration space, when 
this is so very different from de Broglie’s original idea of having a particle and a 
wave — one wave per particle? The answer is that this turns out to be the correct 
description of nature. Spectral lines for many-electron atoms result correctly from 
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ground states and excited states,! just because the corresponding wave functions are 
bona fide functions on configuration space, i.e., they do not factorize into “single- 
particle” wave functions. Schrédinger’s wave function was, and will always be, a 
function on configuration space. 

It was recognized, mainly by Schrédinger and Einstein, that this fact might be 
a revolution in physics. It is in fact the revolution brought upon us by quantum 
mechanics. Not wave and/or particle, not operator observable, not uncertainty prin- 
ciple, and certainly not philosophical talk, even when filled with good sense (if 
that is possible). These are no revolution with respect to good old classical physics. 
What is new is that the description of nature needs a function on the configuration 
space of all particles in the system. And why is that revolutionary? The point is that 
such a description involves all particles in the universe at once, whence all particles 
are “entangled” with each other, and there is no obvious reason why particles that 
are very far apart from each other should become disentangled. Entanglement was 
the word Schrédinger used, acknowledged and celebrated almost a century later. 
Entanglement is the source of nonlocality. We shall discuss that in the chapter on 
nonlocality. 

Now here is a less profound assertion: the Schrédinger equation is the simplest 
Galilean invariant wave equation one can write down. That is in the end how we 
should view that equation. Never mind how we came to it! Mathematicians may 
want to argue about the terminology “simple”, and others may argue about Galilean 
invariance, although physicists do not usually argue with that. What is there to argue 
about Galilean invariance? Clearly, V has to be a Galilean invariant potential, so 
there is nothing to argue about there. What may seem puzzling, however, is that 
the Schrédinger equation contains a first order derivative in time, so what about 
time-reversal invariance? Another disquieting observation is the imaginary unit i 
multiplying the time derivative. That means that y will in general be complex. As if 
complex numbers could suddenly achieve physical reality! They do in fact, and we 
shall come to that. 

Before that we begin with the simple things. We may forget about the potential, 
which must be a Galilean invariant expression and nothing more needs to be said 
about that. Let us look for simplicity at the one-particle equation 


oy _ 


First check translation invariance in space (q’ = q-++a) and time (¢/ = 1+), as well 
as rotation invariance (q' = Rq, t’ =f). The invariance is easy to see. Both 


v (q’,t’) = y(q’ _ a,t’ ~~ s) 


and 


' V can be taken as sum of Coulomb potentials between the electrons surrounding the nucleus 
and the Coulomb potential between the electrons and the nucleus, “binding” the electrons to the 
nucleus. 
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Veal aly). tal yl 
W(a.t) = yar) 


satisfy (7.3) in the primed variables. (Note that A= V'V = V-V is a scalar and 
hence automatically rotation invariant.) 

In order to see time reversibility, one must open up a new box. Suppose f’ = —t, 
q' = q, and suppose that w‘(q/,t’) = w(q’,—7’). Equation (7.3) contains only one 
derivative with respect to time, and therefore the left-hand side of (7.3) changes sign, 
while the right-hand side remains unchanged. This is not good. Chapter 3 helps us 
to move on. To f ++ —t and (q’ = q), we adjoin complex conjugation i++ —i, i.e., 
wr y". Then setting 


y'(q’,t’) = Wi qd, -t') , 
we see that y’ satisfies (7.3) (in primed variables). How is this? We take the complex 
conjugate of (7.3) and see that there is an extra minus sign. But hold on! What if 
there is a potential function V(q) in the equation, as there generally will be? Then 
taking the complex conjugate of the equation (7.2), the resulting equation contains 
V*. Time reversal holds only if V = V*, 1.e., if V is real! 

Now, the way we argued that the Schrédinger equation comes about, it seems nat- 
ural that V, as the classical potential, should be real. But once the equation is there, 
the meaning of V must emerge from the analysis of the new theory, and V could in 
principle be any Galilean invariant function. Since y is complex, why not V? Here 
we now see that V must be real for the “theory” to be time-reversal invariant. 

We come now to Galilean boosts when the coordinate system is changed to a 
frame which moves with relative velocity —u. Then 


q=qt+uw (v=v+u), Pap. (7.4) 


Suppose 
Wiad.) = yd u'r’). (7.5) 
But then (in the following q stands for q/ — ut’) 
ae 90 
insjw (dr) = ins; wa —ur',t’) 


0 : 
= in wat’) —ihu-Vy(q,r') 


2 


h : 

= —5Aw(a.t’) —ihu- Vy(q,1') 
Rr 

= —5 Awd.) —ihu-V'y'(q',r’), (7.6) 
m 


and we obtain an extra term in the Schrédinger equation. The transformation of y 
under boosts must be more complicated than (7.5). No wonder you might say. It is 
after all a wave which we wish to boost in this way. 
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So let us consider the de Broglie plane wave eilka-o(k)!) | with dispersion 


Applying (7.5), setting k, = mu/h, and doing simple binomial analysis using the 
dispersion, we obtain 


i[k-(q/—u7’)—o(k)s'] 


= exp {i ed = (14, é ok)) 7 } 


= exp{ilk-q’ — o(k + ki)t! + (ky) } 


e 


=exp {i [(k+k,)-q' — o(k +k, )t’] } exp { —il[k,-q' — o(k,)t’] } . 
This suggests boldly putting, rather than (7.5), 
W(t!) = Px, (q',1)y(q' —ur',7’) , (7.7) 
with the plane wave 
®y, (qt!) = elle Olu) (7.8) 
Instead of (7.6), we now have (recalling that q stands in the following for q/ — ur’) 
. W w (dt) =in + { eked atkorly(q/ — u'r’) | 


: ! i 
= el ke (ata) o(ks)] Iowa.) — 5 Aw(q.t') —itu-Vy(a.r)) . 


which equals —f7 A’ y'(q/,1’)/2m, as can be seen from the following: 


2 
-Svwid) 
mee [® (q’,0')w(q/—ur' 2) 
2m ey , 
= [xe (a'.t)] wld a7) 20, (a8 ya — wt) 
Im ul 5) 5) mM u ’ ) 
i / Pgh i. / 1 rs 
~~ |¥'d,(d/.1)]- [Vy a! —w',1)] 
— eifku-(qtur’)—o(k,,)r] ! Ww ry “Vy i 
e hoy(q,t’) Ay(q,t’) . y(q,t) ’ 


where we have made use of u = fik,,/m and (7.1). 
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The transformation (7.7), viz., 
v'(q’,t’) = elk! yy(q! = ur’ tei (ku) 
contains the phase factor eto (kur! which, if ignored, would result in an additive 
constant in the wave equation, i.e., the potential V (when we put that back in) gets 
“gauged” by the constant h@(k,,), which should not have any effect on the physics.” 
This would imply that the physics is described by an equivalence class of wave 
equations which all differ by constants added to the potential. Put another way, 
wave functions which differ by a position-independent phase factor describe the 


same physics. In technical terms, we can say that the Galilean group is represented 
in the projective space consisting of the rays 


(wy) :={cy,ceC,c#0}. 


A common argument to support this goes as follows. The translation 7, and boost 
By are transformations which commute in the Galilean group, whereas acting on 
waves, and ignoring the phase factors @ 1 (Ku) 


Bulg yields e* 4 y(q’—a—ur',')=y'(q,r), 


T,By yields e~'*rgiku-’ y(q! —a—ut',t’) = ey (q/,1') , 
so that we should view w’(q/,t’) and e~'k«@y(q/, 1’) as equivalent. 

The way the wave function transforms may be a bit upsetting. Why should a 
fundamental description of nature change its appearance so drastically when the 
coordinate frame is changed? Maybe it would all become more understandable if 
we knew what role the y function really played? Perhaps the y function is not the 
primary ontological variable and its role is understood only in connection with some 
primary variables still to be determined, like the B field in electromagnetism, which 
changes its sign under time reversal? In the latter case, we understand that this must 
be so by the role the B field plays in the dynamics of charged particles. 

Schrédinger’s equation is a wave equation, a wave evolving in time. However, the 
first use of the equation was to explain the stationary electronic states of an atom, 
i.e., wave functions which depend only trivially on time (through a simple phase 
factor e'#t/"), They are solutions of the so-called stationary Schrédinger equation. 
The contact made with the real world is very indirect, via spectral lines. But what, 
we must ask, is the physical meaning of the wave function? What role does it play? 
What does a traveling (i.e., non-stationary) wave mean? 


? Whether it should or should not have any effect is of course not decidable on a priori grounds. 
We should really withhold any definitive claim until we have completely understood what the new 
theory is. 
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7.2 What Physics Must Not Be 


Heisenberg also wanted a theory behind the spectral lines, but he did not think of 
waves at all. What he did was in a sense much more ingenious, by seeing the discrete 
spectral energies emerging from a new calculus of arrays of numbers — the matrix 
calculus of infinitely extended matrices. In that view, quantization meant that the 
Poisson bracket of Hamiltonian mechanics is to be replaced by the commutator 
[-,-]/if@ and thereby forming an operator algebra. The important output from this 
was the famous Heisenberg uncertainty principle, which in a lax manner of speaking 
says that one cannot simultaneously determine the position and momentum of a 
particle. This became the principle of indeterminism in the new physics. 

After Schrédinger had established the equivalence between his wave mechan- 
ics and Heisenberg’s matrix mechanics, everything eventually found its place in 
Dirac’s formalism. Dirac established the powerful formalism of quantum mechan- 
ics, by adopting a notation for infinite-dimensional vectors whereby the vectors can 
be effortlessly represented relative to a suitable “orthonormal basis”. This is a tech- 
nical tool of great power, although the terminology “orthonormal basis” is in most 
cases mathematically incorrect. But having said this, the final result of such for- 
mal computations is always correct. Dirac’s formalism is what one usually learns in 
courses on quantum mechanics. 

But let us return to 1926. Heisenberg did not like Schrédinger’s waves, and hoped 
that this path would eventually turn into a mere meander. Born, however, immedi- 
ately saw the descriptive power of the time-dependent Schrédinger equation, and 
applied it to scattering. In this application he discovered that the wave function y 
has an empirical meaning as a “probability amplitude’’. In fact, through its modulus 
squared |y(x,t)|?, the wave function, which is generally a complex function, de- 
livers the theoretical prediction for the empirical density of finding the particle at 
position x at time f. 

There are two famous quotes from Born’s two papers. The first [1] is an an- 
nouncement, and the second [2], an elaboration on it. The quotes show that Born 
had an absolutely correct, and one must say frankly ingenious, intuition about the 
meaning of the wave function, namely that it guides the particles and that it deter- 
mines the probabilities of particle positions. This probability was understood as irre- 
ducible? (backed, of course, by the uncertainty principle). The intrinsic randomness 
fascinated the scientific community for decades, and the second quote in particular 
shows that “dark” discussions on that were taking place. In truth, quantum random- 
ness is good old Boltzmannian statistical equilibrium, albeit for a new mechanics: 
Bohmian mechanics. But more on that later. Here is the quote from [1]: 


I want to tentatively follow the idea: The guiding field, represented by a scalar function of 
the coordinates of all particles taking part and the time, evolves according to Schrédinger’s 
differential equation. Momentum and energy will however be transferred as if corpuscles 
(electrons) were indeed flying around. The trajectories of these corpuscles are only re- 
stricted by energy and momentum conservation; apart from that the choice of a particular 


3 Born received the Nobel prize for this discovery, although very late and mainly because of re- 
peated recommendations by Einstein. 
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trajectory is only determined by probability, given by the values of the function y. One 
could, somewhat paradoxically, summarize this as follows: The motion of the particles fol- 
lows probabilistic laws, while the probability itself evolves according to a causal law.* 


In the main paper [2], Born surprisingly felt the need to relativize the strong inde- 
terminism: 


In my preliminary note I have strongly emphasized this indeterminism, since that seems to 
me to best conform with the praxis of experimenter. But anybody who is not satisfied with 
that is free to assume that there are more parameters, which have so far not been introduced 
into the theory, and which determine the individual event. In classical physics these are the 
“phases” of the motion, for example the coordinates of the particles at a certain moment. It 
seemed to me at first improbable that one could casually introduce variables which corre- 
spond to such phases into the new theory; but Mr. Frenkel told me that this may be possible 
after all. In any case, this possibility would not change the practical indeterminism of the 
collision processes, since one cannot give the values of the phases; it must lead to the same 
formulas, like the “phaseless” theory proposed here.> 


Born is close to Bohmian mechanics in his preliminary note, although his naive 
insistence on the Newtonian momentum and energy conservation for the particle 
trajectories® is completely gratuitous and is not in fact correct for Bohmian me- 
chanics. Why should the new mechanics, which is based on a guidance principle, 
obey Newtonian principles? 

In the second quote he says that he thought it improbable (meaning something 
like impossible) that particle trajectories could be introduced in a casual (meaning 
natural) way. But that is exactly what can be done, and it is trivial! The reference 
to Mr. Frenkel is amusing, and historians may find pleasure in finding out what the 
Frenkel story was about, and what he had in mind. 

Why are the quotes so important? Because they show the advent of a new the- 
ory of physics, supplementing Schrédinger’s wave function description by an idea 


4 Ich méchte also versuchsweise die Vorstellung verfolgen: Das Fiihrungsfeld, dargestellt durch 
eine skalare Funktion der Koordinaten aller beteiligten Partikeln und der Zeit, breitet sich nach der 
Schrédingerschen Differentialgleichung aus. Impuls und Energie aber werden so iibertragen, als 
wenn Korpuskeln (Elektronen) tatsachlich herumfliegen. Die Bahnen dieser Korpuskeln sind nur so 
weit bestimmt, als Energie- und Impulssatz sie einschranken; im tibrigen wird fiir das Einschlagen 
einer bestimmten Bahn nur eine Wahrscheinlichkeit durch die Werteverteilung der Funktion y 
bestimmt. Man kénnte das, etwas paradox, etwa so zusammenfassen: Die Bewegung der Partikeln 
folgt Wahrscheinlichkeitsgesetzen, die Wahrscheinlichkeit selbst aber breitet sich im Einklang mit 
dem Kausalgesetz aus. 


5 Ich habe in meiner vorliufigen Mitteilung diesen Indeterminismus ganz besonders betont, da 
er mir mit der Praxis des Experimentators in bester Ubereinstimmung zu sein scheint. Aber es 
ist natiirlich jedem, der sich damit nicht beruhigen will, unverwehrt, anzunehmen, da es weitere, 
noch nicht in die Theorie eingefiihrte Parameter gibt, die das Einzelereignis determinieren. In der 
klassischen Mechanik sind dies die “Phasen” der Bewegung, z.B. die Koordinaten der Teilchen in 
einem bestimmten Augenblick. Es schien mir zunachst unwahrscheinlich, da man Grofen, die 
diesen Phasen entsprechen, zwanglos in die neue Theorie einftigen kénne; aber Herr Frenkel hat 
mir mitgeteilt, da dies vielleicht doch geht. Wie dem auch sei, diese Méglichkeit wiirde nichts 
an dem praktischen Indeterminismus der StoBvorgange andern, da man ja die Werte der Phasen 
nicht angeben kann; sie mu tibrigens zu denselben Formeln fihren, wie die hier vorgeschlagene 
“phasenlose” Theorie. 


© As if the notion of particle alone forced Newtonian behavior. 
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which can be trivially brought to completion. It could easily have been done by Born 
himself. What were Born’s grounds for bringing in probability at all? Of course, 
there was Heisenberg’s uncertainty principle, but that alone was not sufficient, as 
Schrédinger had pointed out to him. There was an extra equation (also derived 
by Schrddinger) — an identity which follows from Schrédinger’s equation — which 
corroborated the interpretation of the wave function as determining the probabili- 
ties. From that equation the guided trajectories are transparently obvious, as will be 
shown in Sect. 7.3. But apart from Born, either Einstein or Schrédinger could have 
done that. 

In fact the equation for the trajectories, albeit interpreted as fluid lines, was im- 
mediately seen by Erwin Madelung (a mathematical physicist and friend of Born) 
[3], and at the 1927 Solvay conference de Broglie introduced the same Madelung 
fluid lines as particle trajectories. But this was ridiculed by all the other participants. 
Einstein in particular found that a guiding wave on configuration space made no 
sense whatsoever: physics must not be like that. However, Einstein had no prob- 
lem with probability being on configuration space, just as the classical canonical 
ensembles are measures on many-particle phase space, so that the statistical part of 
Born’s thesis was fine for him. Einstein nevertheless felt, and Schrodinger likewise, 
that the wave function guiding particles on configuration space meant a revolution 
in physics — if that turned out to be a true feature of nature. 

It is very difficult these days to appreciate what really blocked physicists’ minds 
in this context, preventing them from focusing on the obvious — a new mechanics, 
even if it did turn out to be revolutionary. In our own time people come up with all 
kinds of crazy ideas in physics and are applauded for it. So why did nobody probe 
Born’s guidance idea and see what it could achieve? Why only Mr. Frenkel and 
nobody else? What really prevented anyone from saying loud and clear that particles 
exist and move (what else can they do when they exist)? This is what historians 
should work to find out, because what happened is really beyond understanding. 
But it may take many good historians of science to sort out the mess. 

Ignoring de Broglie’s attempt in the 1927 Solvay conference, and ignoring Mr. 
Frenkel’s and Born’s feeble attempts, the question: What is quantum mechanics re- 
ally about? never quite surfaced as a clear-cut and burning question in the minds 
of physicists, with the exceptions of Einstein and Schrodinger. Instead there was 
a muddle of philosophical talk about what Heisenberg’s findings really meant for 
physics, and whether Schrédinger’s wave function provided a complete description 
of a physical system, in the sense that nothing more is needed (except talk perhaps). 
Early on, Einstein and Bohr were the leading figures in that discussion, and they 
were both right and wrong in some ways. 

Schrédinger originally thought that the wave function provided a complete de- 
scription, in that it described matter, but in the end he could not adhere to that view, 
because the wave spreads while atomistic matter in all experiments came out point- 
like. Let a wave impinge on a double slit. Then on the screen at the other side of the 
slit, a black point appears randomly (in time and space). 

More dramatically Einstein and Schrédinger were concerned about the linearity 
of the wave equation and entanglement. If the wave function is the complete descrip- 
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tion of the physics, this leads to the measurement problem (known most prominently 
as Schrédinger’s cat paradox), which we talked about in the introduction and shall 
do again in the next section. Schrédinger and Einstein (among a few others) had to 
hold on to the view that the quantum mechanical description of nature is incomplete. 

In opposition to this, Bohr, Heisenberg, Dirac, Pauli, and most physicists held 
that the new findings necessitated a new philosophical view of nature. The new 
philosophical view was that the question “What is going on in nature?” is unphysi- 
cal, unscientific, and uneverything. In other words, the incompleteness idea became 
rather heavily outlawed. The new philosophical view also endorsed the understand- 
ing that one does not necessarily mean what one says. For example, when one says 
that the particle hits the screen (where the black spot appears in the two slit ex- 
periment), one does not mean that there is a particle hitting the screen, but rather 
something else. However, this something else cannot be expressed other than by 
saying that a particle hits the screen. 

It was as if our human capabilities were not far-reaching enough to actually grasp 
and describe what really happens when we say that a particle hits the screen. The 
mystery which historians have to work out is why there was a need to forbid the 
meaning that a particle hits the screen, when it does seem that a particle hits the 
screen, especially when this is theoretically very simple to describe. Bohmian me- 
chanics does it with consummate ease. The great mystery is why the majority of 
physicists took to heart the idea that physics must not be about ontology. But what 
else could it be about? 

Perhaps the time has come for something positive. In the end all the founding 
fathers of quantum mechanics were right to some extent. Bohr’s insistence that “ob- 
servables” only have meaning in connection with an experiment and represent no 
properties of a system is largely justified. Einstein and Schrédinger were right in 
their conviction that the wave function cannot represent the complete description 
of the state of a system. Born was right in seeing that, at the end of the day, the 
empirical feature emerging from the complete state description is the |y |? statis- 
tics. Bohmian mechanics combines all these views into one theory in a surprisingly 
trivial manner. 


Remark 7.1. On Solutions of Schrédinger’s Equation 

The Schrédinger equation is a linear partial differential equation. As such it does not 
conceal any of the exciting features which make nonlinear partial differential equa- 
tions so appealing to mathematicians: shock waves, explosions, and so on. And yet 
the theory of classical solutions of the Schrédinger equation, i.e., solutions which 
are nice differentiable functions and which solve the Schrédinger equation in just 
the way a normal person would expect, is not usually textbook material. Mathemati- 
cal physics focuses more on the Hilbert space theory of solutions which is connected 
to the self-adjointness of the Schrodinger operator and the unitarity of the time evo- 
lution. We also do that in Chap. 14. In Bohmian mechanics, which we discuss in the 
next chapter, the wave function must be differentiable, and so one needs classical 
solutions of the Schrédinger equation. The relevant assertions about the classical 
solutions of Schrédinger’s equation can be found in [4, 5]. | 
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7.3 Interpretation, Incompleteness, and p = |y|7 


In books and seminars on quantum mechanics, there is so much talk about interpre- 
tation. One talks about interpretations of quantum mechanics: Copenhagen, many 
worlds, Bohmian , and so on. As if the laws of quantum mechanics were a Delphic 
oracle which required high priests to be deciphered. What is special about quantum 
mechanics as compared to Newtonian mechanics, where only a few scientists (in- 
fluenced by quantum mechanics) would insist that Newtonian mechanics needed an 
interpretation? Newton certainly did not think this way, and nor did Leibniz (actu- 
ally the equations in the form we are used to seeing them were written by Leibniz). 

Interpretation has become a multipurpose concept in quantum mechanics. The 
interpretation of the wave functions is that |y P is a probability density. That is 
interpretation in the good sense. We shall look at that in this section. Bohmian me- 
chanics is often said to be an interpretation of quantum mechanics, which should 
indicate redundancy, i.e., that it is merely an interpretation. But Bohmian mechan- 
ics is not an interpretation of anything. We shall come to that in the next chapter. 
It should be clear from the above quotes that Born would also have called the the- 
ory with trajectories (with “phases”’) a new theory. On the other hand, in some very 
vague sense, Bohmian mechanics is an interpretation of quantum mechanics. It is a 
complete theory where nothing is left open, and above all, it does not need an inter- 
pretation. It is a theory of nature, and it has a precise link to quantum mechanics. 
Indeed, it explains its rules and formalisms, so when someone says that the momen- 
tum and the position of a particle are non-commutative operators, which does sound 
like a Delphic oracle, Bohmian mechanics fills in all the ideas needed to see what 
this could possibly mean. 

All interpretations in quantum mechanics are linked to one essential question: 
What is the role of the wave function in physics? As already mentioned, Schrédinger 
originally thought that the wave function represented the stuff the world was made 
of — a matter wave, so to speak. Indeed, it is a matter wave, on a strangely high- 
dimensional space, the configuration space of all the particles as it were. But since 
there are no particles, only a wave, it is not the configuration space of particles, but 
simply a curious high-dimensional space. 

One could just live with that. But other things speak against the “interpretation” 
of the wave function as a matter wave. Even thinking of the wave packet of a sin- 
gle electron, that wave packet spreads according to the dispersion relation, whereas 
spread out electrons are not observed. They are always pointlike. Send an electron 
wave through a slit, and the wave evolves after the slit according to Huygens’ prin- 
ciple into a spherical wave. Yet on the photographic plate somewhere behind the 
slit, we see a black spot where the particle has arrived, and not the gray shade of the 
extended wave. 

The idea of matter waves becomes grotesque when one considers measurement 
situations and the measurement apparatus is itself described by a matter wave. Since 
the Schrédinger evolution is linear, superpositions of wave functions evolve into 
superpositions. We arrive at Schrédinger’s cat. We do not really need to repeat this 
here, but let us do so anyhow, just to make the point. Suppose a system is described 
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by two wave functions @; and @, and an apparatus is built such that its interaction 
with the system (the “measurement’) results in the following situation: the pointer 
points out | if the system is described by the wave function @ or 2 if the system 
is described by @2. That is, the apparatus is described by the two wave functions 
‘, and % (when the pointer points to | or 2, respectively) and the pointer 0 wave 
function %, so that 


Schrodinger evolution 


iF —— Gil, . (7.9) 


However, the Schrédinger evolution is linear and it follows from (7.9) that, if the 
wave function of the system is 


P=c1i9i +c2hr , c1,a€EC, 


then one obtains 


Schrédinger evolution 
— 


oO = (c191 +: c2g2)%H ci Yi +c2Qo% . (7.10) 


This is a bizarre matter wave on the right where the apparatus points simultaneously 
to | and 2. This conflicts with the way the apparatus was built, or if one prefers, 
with what our experience tells us about the way pointers show facts: either 1 or 2. 

Schrédinger was well aware of this, as we explained in the introductory chapter.’ 
The conclusion is that either the Schrodinger evolution is not right, or the description 
is incomplete. The Schrédinger evolution not being right lends support to a serious 
competitor of Bohmian mechanics, namely the dynamical reduction theory or GRW 
theory.® The description not being complete leads straightforwardly to Bohmian 
mechanics. 

Let us now return to Born. We quoted above from his scattering paper, and the 
question is: On what grounds could Born corroborate his intuition for the proba- 
bilistic interpretation of the wave function? With Schrédinger’s help,’ an identity 
was established for any solution of the Schrédinger equation. This identity involves 
the “density” |y(q,t)|* = w*(q,t)w(q,f) and has the form of a continuity equation, 
called the quantum flux equation: 


dl yl? 
Ot 


+V-j¥=0, (7.11) 


7 A helpful discussion of the possibility of “assigning matter to the wave function” can be found 
in [6]. 

8 GRW stands for Ghirardi, Rimini, and Weber, who formulated a nonlinear random evolution law 
for wave functions in such a manner that, in measurement situations, the theory reproduces the 
correct Born statistical law. See [7] for an extensive overview. It is remarkable that this nonlocal 
random collapse theory can be formulated in a Lorentz invariant way [8]. 

° Born’s first idea was that |y| was the probability density, but Schrédinger pointed out that | y| 
does not satisfy a continuity equation, while |y|* does. This meant that, with the latter choice, 
probability would be conserved. 
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where V = (Vi)p=1,... n= (Ogg sOqq3*+* 5 Oqy) and JY = ce ae a) is the so-called 
quantum flux with 


h h 
oo * = * ae * 
ic = Fim Viw— Vey") ried Vi, (7.12) 


or in configurational terms, introducing the mass matrix m, 


h 
j’= 5m (WV wy") =hm'Sy'Vw. (7.13) 
What one usually observes is that, integrating (7.11) over the whole of (configura- 
tion) space and using Gauss’ theorem, the integral over the divergence term yields 
zero when y falls off fast enough at spatial infinity, whence 


ly? Ng—— f w3N 
[ea q=-—[V-j'r°q= 


where Bp is a ball of radius R. The vanishing of the flux through an infinitely distant 
surface will later be taken up in rigorous mathematical terms when we discuss the 
self-adjointness of the Schrédinger operator. 

So { q|w(q,t)|? d°%q does not change with time, and 


—lim | j¥-do=0, (7.14) 
R00 Br 


[avianPara=faiwia.oPera, 


which is usually expressed by saying that probability does not get lost. The reader 
who has absorbed Chap. 2 will have little problem in rephrasing (7.11) as a bona 
fide continuity equation. Just write 


j” 


Vj =V- Sly =: V-v¥ ll’, (7.15) 
ly 
so that we have a vector field 
j”(q,t) -1 Vy 
v¥(q,t) = —— =hm '3—(q,t), (7.16) 
49 = Twig. oP y he 


along which the density p(q,t) = |w(q,t)|? is transported, i.e., we have the conti- 
nuity equation 


2 
OlVE vw) = Gs (7.17) 
Ot 
The integral curves along this vector field are the trajectories Born might have had 
in mind. Particles moving along these trajectories are guided by the wave function, 
since the vector field is induced by the wave function. But unfortunately Born in- 
sisted on momentum and energy conservation, which do not hold here, because the 
integral curves are in fact the Bohmian trajectories. Why was this not accepted and 
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the theory thereby completed? No one knows. In any case p(q,t) = |w(q,t)|* be- 
came the accepted interpretation of the wave function: it determines the probability 
of finding the system in the configuration q when its wave function is y(q,t). 

And here enters another confusing issue. There was a debate about whether the 
right word was “finding” or “is”, because the latter would insist on particles actually 
being there, i.e., the system is in the configuration q with probability |w(q,1)|’, 
and the former would not say anything of substance. However, the source of the 
debate can be located somewhere else, namely, in the measurement formalism of 
quantum mechanics. Bohmian mechanics will explain that it is correct to say “is” 
for the positions of the system particles, but that it is not correct to say “is” for other 
“observables”. This sounds deeper than it really is, but the clarification of this point 
is absolutely essential for a rational understanding. 

We now compute the identity (7.11) with j¥ given by (7.12). Starting with 
Schrédinger’s equation 


ow 


he 
ia = Aw ; 
; ot Don a vy 


since V is a real function (by time reversibility!), complex conjugation yields 


: ie 2 2 
t= — Lag eV HV (7.18) 


Multiply the first equation by y* and the second by y. The resulting equations have 
the same Vyy* term on the right, so subtract the equations and observe that the 
time derivative terms add to a time derivative of |y|?. Hence, 


lw? ae ‘ 
= Lam ARW — Whe") 


2 
= -y  vey'Vaw— Vey") (by the product rule) 
Mk 


I 


-inV -j% , (7.19) 


with j” given by (7.12). 

So what happens next? The reader who has come this far and who appreciates 
the statistical mechanics we prepared in previous sections may find the situation 
thrilling. Many analogies may come to mind, and many questions: What is this par- 
ticle theory whose trajectories are so clearly stated in the trivial rewriting (7.17)? 
What is the status of p = |y|* in this theory? And what about the uncertainty prin- 
ciple? Is that an extra metaphysical principle for the new theory with trajectories? 
Does p = |y|* bear any analogy with the Liouville measure of classical statistical 
mechanics? What new things can be learned from the new particle theory? What 
about entanglement? So many exciting questions to ask, and so many easy answers 
to give! The following chapters will address all these questions. The simplicity and 
ease with which all these issues are explained is stunningly beautiful. 
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Chapter 8 
Bohmian Mechanics 


Bohmian mechanics is the new mechanics for point particles. In the equations for 
Bohmian mechanics there are parameters ™m,...,m™y which we shall call masses. 
We do so because in certain physical situations the particles will move along New- 
tonian trajectories and then these masses are Newtonian masses, and there is no 
point in inventing new names here. Although the theory is not at all Newtonian, it is 
nevertheless close in spirit to the Hamilton—Jacobi theory and an implementation of 
Born’s guiding idea. The theory is in fact the minimal non-trivial Galilean theory of 
particles which move. We already gave the defining ingredients in the last chapter. 
Now we shall spell things out in detail. 

An N-particle system with “masses” m,,...,my is described by the positions 
of its N particles Q),...,Qnv, Q; € IR?. The mathematical formulation of the law 
of motion is on configuration space R*, which is the set of configurations Q = 
(Qi,.-- , Quy) of the positions. The particles are guided by a function 


y: RY xR—>C 
(q,t) > y(q,t) 


in the following way. y defines a vector field v”(q,t) on configuration space 
V 
W(4,t) = tm SE (at) (8.1) 


where m is the mass matrix as in (2.3) and the possible particle trajectories are 
integral curves along the vector field: 


dQ 
 =v¥(Q(0),1) (8.2) 


In other words the kth particle trajectory Q;(t) obeys 


dQ a Vw 
—— = —3$—*(Q,t k=1,...,N. 8.3 
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y obeys the Schrédinger equation 


in (q,1) = = Ei (q.7) +V(q) W(q.7) (8.4) 
5, (a am ka: q) (4,1) , 


where V is a Galilean invariant function. The operator on the right of the Schrédinger 
equation is often called the Schrédinger Hamiltonian or simply the Hamiltonian. We 
shall use this name in the sequel. 

Equations (8.1)—(8.4) define Bohmian mechanics. Looking at them, one grasps 
immediately that this mechanical theory is different from Newtonian mechanics. 
Given the two equations, it is now only a matter of analyzing the theory to see what 
it says about the world. We stress this point: what is required now is not philosophy, 
and not interpretation, but the very thing that physicists are supposed to do best. The 
task is to analyse the equations, and only the equations, and see what they say about 
our world. We shall do that in the following chapters, but first we allow ourselves 
an interlude. 

Everybody (except perhaps mathematicians) must admit that equation (8.1) looks 
a bit ugly. The vector field was already introduced in the previous chapter [see 
(7.16)]. So in principle we know “where it comes from”. It comes from interpreting 
the squared modulus of the wave function as a probability density, whose transport 
is given by the quantum flux equation (7.11). Bohmian trajectories are nothing but 
the flux lines along which the probability gets transported. The velocity field is thus 
simply the tangent vector field of the flux lines. That is the fastest way, given the 
Schrédinger equation (or for that matter any wave equation which allows for a con- 
served current) to define a Bohmian theory, a way which was often chosen by John 
Bell [1] to introduce Bohmian mechanics.! 

But one must admit that this does not say much to support the integrity of the 
velocity field, since as we understand it probability is a secondary concept, arising 
only when one analyzes typical behavior. Nevertheless the reader who is eager to 
see the theory at work, and who is happy with the mechanics as given, may skip 
the next section, whose only purpose is to explain that Bohmian mechanics follows 
from arguments of minimality, simplicity, and symmetry.” 


' Bell’s way of defining Bohmian mechanics via the current can be generalized to more general 
Hamiltonians (i.e., more general than the usual Schrodinger Hamiltonian) which appear in quantum 
field theory [2, 3]. 


? The quantum flux as defined by the quantum flux equation (7.11) is not uniquely defined. It 
can be changed by adding the curl of a vector field, because the divergence of such an object is 
zero. This then leads to the idea that the Bohmian trajectories, when defined as flux lines, are not 
uniquely defined, i.e., there exist other versions of Bohmian mechanics [4]. However, these are 
neither simple nor minimal in any sense of the words. The flux is nevertheless a good guidance 
principle for generalizing Bohmian mechanics to more general Schrédinger equations, which may 
involve higher order terms. 
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8.1 Derivation of Bohmian Mechanics 


We wish to find a theory of particles from “first principles”. Moving particles agree 
with our experience of the microscopic world. In a double slit experiment, a particle 
is sent onto a double slit and later caught on a screen. We see the spot on the screen 
(where the particle hits the screen), and since the particle came from somewhere 
else it had to move. 

Particles have coordinates Q, € R*. An N-particle system is then given by a 
collection Q = (Qi,...,Qn) € IR®”., Particle positions are the primary variables we 
need to be concerned with. Naturally, the particles move according to some law. The 
simplest possibility is to prescribe the velocities 


dQ 

Fis v(Q,f) , (8.5) 
where v is a vector field on configuration space R*" (analogous to the Hamiltonian 
vector field on phase space). For simplicity we begin with one particle. We need to 
find a Galilean covariant expression for the velocity vector, and the most demanding 
transformation for a vector is rotation. From Chap. 3, we recall that the gradient of a 
scalar function transforms just right [see (2.35)]. This gives the idea that the velocity 
field should be generated by a function y: 


v"(q,t) ~ Vy(q,t) . (8.6) 


Another symmetry which may be informative when we prescribe only the velocity 
is time-reversal invariance. The velocity changes sign when t ++ —t. How can one 
cope with that? We are already prepared to have complex conjugation do the job. 
Let y be complex, then with t +> —t, let w(q,t) — w*(q,—t) and consider only the 
imaginary part in (8.6): 


v"(q,t) ~ SV w(q,2) - (8.7) 


That takes care of time-reversal invariance. 
Now consider a Galilean boost v: vY¥ +> v¥ + u, ie., 


SVy+u=S8V'y'. 


This suggests putting y’ = eld ty as the simplest possibility. However, it leads to 
an adjustment of the velocity, namely, 


Wat) = a3~¥ (qt (8.8) 


and 


y = eld U/ Oy , (8.9) 
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where @ is a real constant with dimensions of [length /time]. vY is therefore homo- 
geneous of degree 0 as a function of y, i.e., vo” = vY for allc € C. 

So far so good, but how should we choose y? Its role for the motion of parti- 
cles is clear, but what determines the function? We need a Galilean invariant law for 
that. The simplest law that comes to mind is the Poisson equation for the gravita- 
tional potential. However, one may find many arguments as to why y should be a 
dynamical object itself. For example, the fact that y transforms “strangely” under 
time reversal suggests that y should obey an equation which contains time. We are 
therefore led to a minor extension of the Poisson equation to the simplest Galilean 
invariant dynamical equation for a complex function, observing that time reversal 
goes along with complex conjugation: 


oY (4,1) = Bayla.t). 


In the last chapter we discussed the equation with B = —f/2m. The Galilean boost 
to a frame with relative velocity u was implemented using the extra factor eimu-a/h, 
i.e., using eit-a/2B | so that 


y’ = e twa /2B yw ‘ 


Comparison with (8.9) shows that ~@ = —2B. We end up with one-particle Bohmian 
mechanics in the form 


dQ _ vy 


— a3 ¥¥ 
dt (Q,t) = yee ’ (8.10) 


where y(q,t) is a solution of 


Ow(g.t) a 

—~—— =-—A tf). 8.11 

i> 7 Av(4,t) (8.11) 
The linearity of the equation blends well with the homogeneity of the velocity. 
Galilean invariance allows the addition of a Galilean real (because of time-reversal 


invariance) scalar function G(q), so that the most general such equation reads 


OM) _ SF ay(a.t) + Gq) vlan). (8.12) 
The generalization to many particles is immediate by reading q as a configuration. 
Before we do so, we would like to determine the proportionality constant a, 
by connecting the new fundamental theory with the particle mechanics we already 
know: Newtonian mechanics. We suspect that the connection is most easily made via 
the Hamilton-Jacobi formulation — at least for short times. In view of the velocity 
field (8.10), we write 


w(q,t) =R(q,t)eO0/" | (8.13) 
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with real functions R and S. The factor 1/f is nothing but a dimensional unit (and 
will later be recognized as Planck’s constant) to ensure that S has the dimensions of 
an action. Then (8.10) becomes 


—~=— VS, (8.14) 


Hence comparison with the first Hamilton-Jacobi equation suggests that o/h = 1/m 
[see (2.35)], with m as the mass of the particle. With this identification we recognize 
(8.11) as the one-particle Schrédinger equation. 

Of course, one would like to see whether this is consistent with an analogue of 
the second Hamilton-Jacobi equation. In fact, introducing (8.13) into (8.11), we 
obtain 


OR 10S «a 


‘a oh of 2 Tak: 


hac ti .. . il 


Collecting the imaginary parts of that equation yields 


OR a 1 1 
a 2VR:—VS+R-A 
ot 2 ( h 7 h s) , 
or 
R 1 
a —a—V- (R°VS) = —V- (vYR’) , (8.15) 
ot h 
while the real parts give 
OS a_AR la 9 
yi 5h R +hG-4 5 5 (VS) ='0. (8.16) 


Comparing (8.16) with (2.36), the identification a = h/m is consistent, but observe 
the extra term —oiAR/2R which, compared with (2.36), “adds” to the potential 
V =hG. Bohm called this extra term the quantum potential. In any case, if that extra 
term is zero (or negligible), the Bohmian trajectories follow classical trajectories. 
The term is small, for example, when y is essentially a plane wave y = e*4, Then 
S = hk - q, and obviously 


That is, the Bohmian particle moves at least for a short time* along a straight line 
with velocity Hk/m, and then as a classically moving particle we can assign to it a 
classical momentum p = mhk/m = hk. 


3 For a definite assertion about classical motion we need to know how y changes in time, and we 
shall say more on this in the next chapter. 
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Only in special situations are the Bohmian trajectories Newtonian in character, 
namely those where the guiding wave can be approximated (locally, i.e., where the 
particle is) by an almost plane wave (where the wave number k may be a slowly 
varying function of position). Such situations may be described as classical. In many 
such scenarios these trajectories can be seen. A particle track in a cloud chamber, or 
the moon on its orbit are examples. But in so-called quantum mechanical situations, 
where the “wavy” character of the guiding wave plays a role (for example, through 
interference effects), the trajectories will be disturbed by the act of observation and 
therefore change under observation. Thus there may in general be a huge differ- 
ence between “measured” trajectories and “unmeasured” trajectories. We shall give 
examples later. 

Heisenberg concluded from this that there are no trajectories, and that one must 
not talk about a particle having a position. Many physicists found that conclusion 
logical and even banned the notion of particle altogether from physics. Here is fur- 
ther subject matter for historians, to find out how this could have happened. After all, 
classical physics is supposed to be incorporated in quantum mechanics, but when 
there is nothing, then there is nothing to incorporate, and hence no classical world. 

The reader may wonder about equation (8.15). What is it good for? Well, look- 
ing at it once again one sees that this is the identity (7.11) which any solution 
of Schrédinger’s equation fulfills. It will be the key to the statistical analysis of 
Bohmian mechanics. Having said that, we may add that the identity is for Bohmian 
mechanics what the Liouville theorem is for Hamiltonian mechanics. We shall say 
more about this in a moment. 

Let us first note that generalizing to many particles leads to the defining equa- 
tions (8.1)-(8.4), where V = iG can now be identified as what will be, in classical 
situations, the “classical” potential. 


Remark 8.1. Bohmian Mechanics Is Not Newtonian 

David Bohm [5, 6] and many others, e.g., [7] present Bohmian mechanics as be- 
ing Newtonian in appearance. This can easily be achieved by differentiating (8.3) 
with respect to t. Next to —VV, the resulting right-hand side contains the extra term 
—V(quantum potential), which has been called the quantum force. But this takes us 
off target. Differentiation of (8.3) with respect to ¢ is redundant. Bohmian mechan- 
ics is a first order theory. The velocity is not a “phase” variable as in Newtonian 
mechanics, i.e., its initial value must be chosen (and can indeed be freely chosen). 
It becomes a phase variable in certain situations, which one may refer to as clas- 
sical regimes. Casting Bohmian mechanics into a Newtonian mould is not helpful 
for understanding the behavior of the trajectories in quantum mechanical situations, 
because understanding means first explaining things on the basis of the equations 
which define the theory. Redundancies are more disturbing than helpful. Analogies 
may of course help, but they are secondary. a 
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8.2 Bohmian Mechanics and Typicality 


Now we come to the trajectories. We have a theory of particles in motion and we are 
now eager to compute them in situations of interest. Just as we compute the orbits 
of planets, we would like to compute the orbits of the Bohmian particles around 
the nucleus in an atom. Why should we be eager to do that? Because the theory 
allows us to do that? So the electron moves around the nucleus in some way. Is 
that exciting? Yes, you may say, because all kinds of exciting questions come up. 
For example, Heisenberg’s uncertainty relation seems to contradict the existence of 
trajectories, so who is right? 

Well, computing them would not help to decide on that. When the electron moves 
around the nucleus, why does it not radiate (since the electron will surely undergo 
accelerations) and fall into the nucleus? Compared to orthodox quantum mechanics, 
where there is nothing, nothing can fall into nothing? But joking aside, the mechan- 
ics here is Bohmian, not Newtonian, and one has to see what this new mechanics is 
like. Force and acceleration are not elements of the new theory, so any arguments 
based on that line of thought are off target. But the relevance for Heisenberg’s un- 
certainty principle is something one needs to look into. In fact, the principle is a 
consequence of Bohmian mechanics. Let us see now how this works. 

Boltzmann returns to the stage. What we have is a new mechanics, with the old 
statistical reasoning. The only difference is this. While in the Newtonian world non- 
equilibrium was the key to understanding why the world is the way we see it, equi- 
librium opens the door to understanding the microscopic world. Just as we describe 
a gas in a box by the equilibrium ensemble, we now describe a Bohmian particle 
in quantum equilibrium. While Boltzmann’s reasoning was embarrassed in classical 
statistical mechanics by atypicality, which renders the justification of equilibrium 
ensembles almost impossible, we shall now meet a situation where Boltzmann’s 
reasoning works right down the line. 

Quantum equilibrium refers to the typical behavior of particle positions given the 
wave function. If you ask what happens to irreversibility, atypicality, and the arrow 
of time when everything is in equilibrium, then you must read the last sentence 
again. The particles are in equilibrium given the wave function. Non-equilibrium 
resides within the wave function. The following analogy may be helpful. The wave 
function generates a velocity vector field (on configuration space) which defines the 
Bohmian trajectories. This is the analogue of the idea that the Hamiltonian generates 
a vector field (on phase space) which defines classical trajectories. The Hamiltonian 
also defines the measure of typicality. Analogously, the wave function defines the 
measure of typicality. Does that not sound very much like what Born had in mind 
(see Chap. 7)? 

Therefore, as in classical physics, we seek a measure of typicality, which is dis- 
tinguished by the new physical law. As in the Boltzmann—Gibbs ensemble, we first 
seek such a measure with which we can form a statistical hypothesis about the typ- 
ical empirical distribution over an ensemble of identical subsystems. The starting 
point for finding a measure of typicality is the continuity equation [see (2.3)] for the 
density p(q,t) transported along the Bohmian flow of an N-particle system: 
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®,",,4 = Q(t,q) = solution of (8.2) with Q(t) = 
The equation reads 


(G8) y. [y”(a.1)p(4,1)] =0. (8.17) 


Since the guiding field w(q,t) depends in general on time, the velocity field depends 
in general on time, and thus there is in general no stationary density. Nevertheless 
we require typicality, defined by the guiding field, to be time independent, so we 
look for a density which is a time-independent function of y(q,t). This property, 
called equivariance, generalizes stationarity. That density was found by Born and 
Schrodinger, namely p = |y|* = w* . Putting this into (8.17), we obtain 


ade +V-(v¥|yi*) = (8.18) 


which we know is satisfied from the previous chapter, because we compute 
V 
Wy)? fm! S— Ely? = hme 'SyVy = 5 


so that (8.18) is identical to the quantum flux equation (7.11). We may view this 
result as analogous to the Liouville theorem. The physics gives us a distinguished 
measure of typicality.* 

So what does all this mean? In fact it tells us that, if Qo is distributed according 
to the density | y(-,0)|? then ®” Qo = Q(Qo,t) is distributed according to |y(-,t) |? 
where y(-,t) is the wave function of the system. In terms of expectation values, 


> 


(FQ) = | £Q.9)1v@. Para = | F@l¥(a.nPea = EM). 

(8.19) 
But this is Born’s statistical law, which says that p”(q,t) = |w(q,t)| is the prob- 
ability density that the particle configuration is Q = q. Of course, in saying this 
we assume y to be normalized, i.e., { |w(q,t)|?d°%q = 1. And in particular (as 
remarked in the last chapter) [ |y(q,t)|7d°"q is independent of time. In the Boltz- 
mannian way of thinking we are therefore led to formulate the following statistical 
hypothesis, which is the analogue of the Boltzmann—Gibbs hypothesis of thermal 
equilibrium in statistical mechanics: 


4 Ts this density unique? Recall that in classical mechanics the stationarity requirement allows a 
great many densities, among which we discussed the microcanonical and canonical examples. In 
[8], it is shown that the p = |w|? density is the unique equivariant measure (under reasonable 
conditions of course). 
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Quantum Equilibrium Hypothesis. For an ensemble of identical systems, each 
having the wave function y, the typical empirical distribution of the configurations 
of the particles is given approximately by p = |y|*. In short, Born’s statistical law 
holds. 


All this will be elaborated and the hypothesis justified in Chap. 11. 


Remark 8.2. On the Existence and Uniqueness of Solutions of Bohmian Mechanics 
In classical mechanics the question of existence and uniqueness of solutions of say 
gravitational motions is exceedingly hard. How about the new mechanics? Can it 
be that there is absolutely no problem with electrons moving around? After all, in 
most textbooks on quantum mechanics, it is asserted that particles cannot have a 
position and move. The equations are there, but perhaps, you might think, they do 
not have solutions. In fact (8.3) looks potentially dangerous, since the wave function 
is in the denominator, and the wave function can have nodes! It can and will be 
zero at various places! Trajectories can run into the nodes, and that finishes the 
mechanics. But typically they do not. One has existence and uniqueness of Bohmian 
trajectories for all times for almost all initial conditions (in the sense of Remarks 
2.1 and 2.5), where “almost all” refers to the quantum equilibrium distribution [9, 
10]. Note that in Bohmian mechanics the wave function must be differentiable, and 
therefore one needs the classical solutions of the Schrédinger equation we talked 
about in Remark 7.1. a 


8.3 Electron Trajectories 


Maybe the most exciting feature is the Bohmian trajectory of an “electron” guided 
by the ground state wave function of a hydrogen atom. To find that, one considers 
the one-particle stationary Schrédinger equation with V(q) as the Coulomb potential 
for a fixed point charge nucleus, i.e., one solves the eigenvalue equation 


he 
5, Av tVia)y = Ey. 
m 


The solutions y which are normalizable (i.e., square integrable) and bounded are 
called eigenstates. The ground state wo is the one which corresponds to the lowest 
eigenvalue. The eigenvalues E are commonly referred to as energy eigenvalues, and 
the lowest energy value is commonly referred to as the ground state energy. Given 
the underlying ideas, which we partly discussed in the lead-up to Schrédinger’s 
equation, the name “energy” may seem appropriate. Of course, more elaboration 
would be needed to actually feel comfortable with calling objects of the new theory 
by names which have a clear connotation from classical physics. If we are to call 
the eigenvalues “energy”, they had better bear some convincing relation to what we 
usually mean by energy. 
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Fig. 8.1 Ground state wave function distribution (dark regions correspond to high probability) for 
an ensemble of hydrogen atoms in the ground state, with | yo|? random electron positions 


So let us consider the ground state. In principle, there could be many such states. 
But not for hydrogen. It is a mathematically rigorous statement that the ground state 
can always be chosen real (and everywhere positive). This is more than we need to 
know! If the guiding field is real, then the imaginary part is zero, so the velocity 
1S Zero, 1.€., Q = 0, and Q = Qo, so nothing moves! The reader should beware of 
the following idea, which is better considered as a bad joke: since the electron does 
not move it cannot radiate, and that explains why the electron does not fall into the 
nucleus. 

But jokes aside, this finding is very much in contradiction with Bohr’s naive 
model of electrons circling the nucleus. Moreover, it sharply contradicts Heisen- 
berg’s uncertainty principle. Although the position has some finite spread (we shall 
say what that means in a moment), the velocity, i.e., the momentum, is precisely 
zero! Is Heisenberg’s uncertainty principle false after all? Of course, it is not! We 
must simply understand what it is telling us, and that means we must understand 
what the momentum spread refers to in quantum mechanics. 

Clearly, Bohmian mechanics is a radical innovation, so different from what one 
naively thought! What else is new? What can we say about Qo? Only that it is 
| Wo|?-distributed, by virtue of the quantum equilibrium hypothesis. This means that, 
in an ensemble of hydrogen atoms in the ground state (see Fig. 8.1), the empirical 
distribution of Q is typically close to | yo|?. That can be checked by experiment. The 
spread in position is then simply the variance of the |yo|?-distribution. 

Stationary states belonging to higher energy values, called excited states, will 
be of the form y; = f(r, %)e!? (which is due to the way the angle @ occurs in the 
Laplace operator), writing q in spherical coordinates (r, 0, ). Then S = hg and 


1 hol O hol 
Vv = — — 
vd) ma mrsind °d0* mrsind ?’ 


so that we find periodic orbits: 
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h 1 
Q(t) ¢ ro, B= V0, *— 1+ 40) : (8.20) 
mo sin Vo 
But that should not excite anybody. 
What happens to the motion of the particle when it is guided by a superposition 
of two real wave functions y; and y, for each of which v = 0? It should be noted 
that 


w=witaw, aec, 


may produce a very complicated motion. 

Is it in fact useful to compute Bohmian trajectories in various quantum mechan- 
ical situations we deem of interest? In general, it is not! But in certain asymptotic 
situations, when the trajectories are close to classical ones, that knowledge is useful 
and even crucial for understanding the microscopic physics. That is the case in scat- 
tering theory, which we shall deal with in a later chapter. But what about the above 
negative response? What we learn in general from the trajectories is that, at each 
moment of time f¢, the particles have positions which are distributed according to the 
quantum equilibrium hypothesis | y(q,t)|*. Moreover, the theory is first order: 


Q=v"(Q,). 


From this it follows that the possible trajectories cannot cross in extended configu- 
ration space R*” x R, while this is possible in Newtonian mechanics. 

With this observation, one can easily construct the trajectory picture in the fa- 
mous double slit experiment for a stationary wave (see Figs. 8.2 and 8.3). In this 
experiment one sends particles one after the other (think of one particle per year, if 
you have enough time) towards a double slit. As any wave would do, after passage 
through the slits, the guiding wave y of every particle forms a diffraction pattern 
according to Huygens’ principle. After the slit, this diffracted wave determines the 


Fig. 8.2 Interference of spherical waves after a double slit 
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Particles which arrive here 
passed the upper slit 


Particles which arrive here 
passed the lower slit 


Fig. 8.3 Possible trajectories through the double slits 


possible trajectories of the particle. The particles arrive one after the other (every 
year if you have the time) at the photographic plate and leave a black spot at their 
point of arrival. 

If we wait long enough, the random black spots will eventually form a recog- 
nizable interference pattern, which is essentially the quantum equilibrium | y|? dis- 
tribution? [11]. This is clear enough, because that is what the quantum equilibrium 
hypothesis says: in an ensemble of identical particles each having wave function y, 
the empirical distribution of positions is | y|?-distributed. This is a rather dull obser- 
vation, and yet the interference pattern of the double slit experiment is often taken 
in textbooks as proof that one cannot have moving particles in atomistic physics. 
We can say a bit more about the trajectories in the essentially two-dimensional ex- 
perimental setup. 


1. The trajectories cannot cross the axis of symmetry. 

2. The trajectories move mostly along the maxima (hyperbola) of |y|* and spend 
only short times in the valleys of |y|? (|y|? ~ 0). 

3. The trajectories cross the valleys since, right after the slits, the trajectories expand 
radially. This is turn happens because the guiding wave is a spherical wave close 
behind the slits (see Fig. 8.2), and while first feeding the nearest maxima, they 
must observe quantum equilibrium. Most trajectories will have to lie in the region 
of the main maximum (around the symmetry axis, which is the most clearly 
visible on the screen), i.e., trajectories must cross over from adjacent maxima to 
the main maximum.° 

4. The arrival spot on the screen is random, in particular the slit through which each 
particle goes is random. The randomness is due to the random initial position Qo 
of the particle with respect to the initial wave packet. By always preparing the 
same wave packet y, one prepares an ensemble of |y|?-distributed positions. 


> In fact it is the quantum flux across the surface of the photographic plate integrated over time 
(see Chap. 16). 


© The double slit trajectory picture Fig. 8.3 can thus be more or less drawn by hand, or by numerical 
computation as done by Dewdney et al. [6]. 
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Fig. 8.4 Tunneling through a barrier 


In many quantum mechanical situations one computes numbers from a stationary 
analysis, i.e., from a stationary wave picture. One must be clear about the meaning of 
such stationary computations. In the section on scattering theory, we shall examine 
the applicability of stationary pictures, but for now we shall only discuss the simple 
one-dimensional, textbook example of the stationary analysis of tunneling through a 
barrier. The latter is a potential of height V as shown in Fig. 8.4. One solves the time- 
independent Schrédinger equation in the three regions separately, with an incoming 
plane wave from the left: 


i J2mE 


(1)x<0: —7— Ay = Ey, w=e*+ Ae, k= 3 
m 
he 2m(V —E 
2)0<*< a: ae mes y =Be*+Ce"™, p= vem) 
i? oe 
Cass —5 Ay = Ey, y=De™*, 
m 


where the constants A,B,C, D are determined by the requirement that the trajectories 
should be differentiable across the boundaries of the regions. This is commonly 
expressed by requiring j” to be continuous. 

In the region x < 0, we have a superposition of two wave functions, one incoming 
and one reflected, yielding a complicated current, while for x > a, we have a simple 
current. But the motion is one-dimensional and therefore particle trajectories cannot 
cross each other. Observing that the wave function on the left is in any case periodic, 
there cannot be two types of trajectory on the left side, moving towards and away 
from the barrier. By computation, one finds that all trajectories do in fact move 
towards the barrier, which seems to contradict the idea of back-scattering, since 
particles clearly get reflected. 

The situation becomes more dramatic if the potential barrier becomes infinitely 
high, so that nothing can get through and everything is reflected (A becomes —1). 
Then on the left the wave is purely imaginary and nothing moves! Bohmian mechan- 
ics forces us to realize that the stationary picture is an idealization. This idealization 
is useful for computing the so-called transmission and reflection coefficients, which 
are computed from the ratios of the “incoming” and “outgoing” quantum fluxes, 
and are equal to |D|* and |A|. But it is only clear that A and D achieve this meaning 


Mathematical Physics 


158 8 Bohmian Mechanics 


when one considers the physically realistic setting where a wave packet yp runs to- 
wards the barrier and part of the wave packet gets reflected, while part goes through. 

If yp represents the initial wave packet (the prepared one, which may look ap- 
proximately like a plane wave), we can split the support of yp into two intervals T 
and R. T is the area of initial positions giving rise to trajectories which reach x > a 
(these are the positions in the front part of the wave packet), and R is the interval 
(Oo ER, Oo € T => Q < Qo) Of initial points giving rise to trajectories which 
move towards —cs. The probability that the particle starts in T is 


J \vol?da= [DP 


In the section on scattering theory, we shall say more about the meaning of the 
quantum flux, but we already have a pretty good understanding of how Bohmian 
mechanics works. 


8.4 Spin 


The reader who has come this far must eventually stumble across spin. Granted 
that particles have positions, that accounts for position, maybe even for momentum, 
once we understand what that refers to, but what is spin? How can a point have 
spin? What does rotation even mean for a point? Spin is indeed a truly quantum 
mechanical attribute. Wolfgang Pauli referred to it as nonclassical two-valuedness. 
Spin is in fact as quantum mechanical as the wave function, which is no longer 
simply complex-valued, but can be a vector with two, three, four, or any number of 
components. 

While it is easy to handle and not at all strange or complicated, to explain from 
pure reasoning alone why spin arises would nevertheless go somewhat beyond the 
scope of this textbook. Spin plays a role only in connection with electromagnetic 
fields, and the ultimate reasoning must therefore invoke relativistic physics. We do 
not want to go that far here. So let us be pragmatic about it at this point, and simply 
consider the phenomenological facts. 

A silver atom is electrically neutral and inherits a total magnetic moment from 
the spin of its one valence electron. If such an atom is sent through a Stern—Gerlach 
magnet, which produces a strongly inhomogeneous magnetic field, the guiding wave 
splits into two parts.’ Each part follows a trajectory which resembles that of a mag- 
netic dipole when it is sent through the Stern—Gerlach magnet, and when its orien- 


7 Tf one sent an electron through the Stern—Gerlach magnet, the splitting would not be as effective, 
due to strong disturbances from the Lorentz force which also acts on the electron. The reader 
might enjoy the following historical remark on this point. Bohr envisaged the impossibility of ever 
“seeing” the electron spin as a matter of principle: the spin being “purely quantum” and the results 
of experiments always being classically describable, the spin must not be observable! 
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Fig. 8.5 In a Stern—Gerlach apparatus (strong inhomogeneous magnetic field), whose pole shoes 
are drawn schematically, a spinor wave function splits into two parts. If one of the parts goes 
through a Stern—Gerlach magnet of the same orientation, no further splitting occurs. If one of the 
parts goes through a magnet with a different orientation, a further splitting occurs 


tation is either parallel (spin +1 /2) or antiparallel (spin —1/2) to the gradient of the 
magnetic field. 

Suppose a Stern—Gerlach setup is oriented in the z-direction, and suppose the 
triangular pole shoe lies in the positive z-direction (see Fig. 8.5). This prepares a 
spin +1/2 and a spin —1/2 guiding wave, one moving towards positive z values 
(upwards), let us say in the direction of the triangular pole shoe, and one moving 
towards negative z values (downwards), i.e., in the direction of the rectangular pole 
shoe. Of course, when the wave packets are clearly separated, the particle will be in 
only one of the two wave packets. If we block let us say the downward-moving wave 
packet by a photographic screen, and if no black point appears, the particle must be 
in the other, upward-moving packet. In a manner of speaking that particle then has 
z-spin +1/2, and one might say that it has been prepared in the z-spin +-1/2 state. 

If one now sends that packet through a second Stern—Gerlach setup, also oriented 
in the z-direction, that packet does not split, but continues to move further upwards. 
But if the packet is sent through a Stern—Gerlach magnet which is oriented, say, 
in the y-direction (any direction orthogonal to the z-direction), then the wave splits 
again, and with probabilities of 1/2 the particle will be traveling in the positive or 
negative y-direction. 


8 The value 1/2 stands without explanation at this point. For the considerations in this book, we 
may as well replace 1/2 by unity. The reason why one chooses to talk about spin +1/2 has to do 
with putting spin into analogy with rotation. This analogy is not so far-fetched, since spinors, as 
we shall see shortly, are acted on by a special representation of the rotation group. 
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Choosing a different non-orthogonal direction, the probabilities for the splitting 
vary with the angle. An angle of 90° yields probabilities of 1/2 for up and down, 
while another direction will give more weight to the spin axis closest to the z- 
direction. This has been observed experimentally. 

The guiding field of the particle must obey the rules of the splitting and the 
simplest way to accomplish this is for the wave function itself to possess two degrees 
of freedom. It thus becomes a two-component wave function, i.e., a spinor wave 
function: 


Bohmian mechanics for spinor wave functions is simple. Start with one particle and 
write (8.1), viz., 


yw = 2g (W.V¥) (8.21) 


m (W,W) 


Wt G1 
(WYoy=| |: =) WO, 
W G2 », 


i.e., take the scalar product on the spinor degrees of freedom. 

The Schrédinger equation is replaced by the so-called Pauli equation, which is 
an equation for the two-component wave function, in which the potential V is now 
a function with Hermitian matrix values. A Hermitian matrix can be decomposed in 
a special basis, comprising the Pauli matrices 


0 1 0 -i 1 O 
On = ’ Oy = 3 ’ Oz = ’ 
1 0 : i O 0 -1 


and the unit 2 x 2 matrix E>: 


where 


3 
V(q)E2 + B(q)-o = V(q)E2 + >) Be(q) ox , 
fA 


where V € R, B € R3, o := (0, Oy, 0,). The Pauli equation for an uncharged, i.e., 
neutral, particle in a magnetic field B then reads: 


oy hn 
ih (4,4) = — 5, 4¥(a.4) —Uo -B(q) y(q,¢) ) (8.22) 
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where is a dimensional constant called the gyromagnetic factor. This equation 
ensures that pY = (y, w) is equivariant,’ i.e., the four-divergence 


(0;,V)-(p”,v"p™) = 0, 


another way of expressing the continuity equation. So p” is the quantum equilib- 
rium distribution. 

We said at the beginning of the section that we do not want to engage in a dis- 
cussion about the question of how spin arises from pure thought about nature. To 
do this, we would need to rework the arguments leading to the Dirac equation, of 
which (8.22) is a non-relativistic approximation for an uncharged particle like the 
silver atom sent through the Stern—Gerlach apparatus. However, it may be helpful 
to describe the mathematical reason for the appearance of spinors. The point here 
is that the rotation group SO(3) has a double-valued representation, namely SU (2), 
the group of unitary 2 x 2 matrices with unit determinant. This is called the spin 1/2 
representation. 

This becomes clear once one understands that SO(3) is a Lie group, i.e., both a 
group and a manifold, and that, in the case of SO(3), it is not simply connected. 
There is thus a topological reason for the double-valued covering. When we talk 
about “identical particles”, we must return to topological considerations, but we 
shall not elaborate on this connection between SO(3) and SU(2). We shall limit 
ourselves to showing the double-valued representation at work. 

Consider the rotation of a vector through an angle 2y about the z-axis: 


x cos2y —sin2y 0 4 
y | =] sin2y cos2y 0 y 
I 

z 0 0 1 z 


Using the Pauli matrices, any vector x can be mapped to a Hermitian matrix 


4: a Z x-ty 
x & xo, Oy + Z0, = ; 
te * x+iy -z 


a representation which originates from Hamilton’s quaternions. It is easy to check 


that 
Zo x iy ely 0 Zz x-iy ey 0 
e+iy —2 | \ 0 eY] \xtiy -2 G ery? 


U(y) = « v € SU(2). 


0 el’ 


where 


° The potential matrix being Hermitian goes hand in hand with the Hamilton operator being Her- 
mitian, which we shall discuss in detail later. 
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Spinors are now introduced as a kind of square root of the matrix, by writing it as 
sum of dyadic products of C? vectors, which are called spinors in this context: 


Z x-i1 
( : ’) = s1s7 +5087 , 
X+ly —Z 


) € C? and s+ = (a* b*). For example, 


(°-2)-(Jumr(om 


Then we can view U(y) as acting naturally by left and right (adjoint) action on 
the spinor factors, producing the rotation through 2y in physical space. So while a 
rotation through 27 in physical space brings a vector back to its starting direction, 
the spinor rotates through only a half turn in its space. This explains why the spin is 
attributed the value of 1/2. We close this short interlude by observing that both U(y) 
and —U(y) represent the physical rotation through 27, which is why this is called a 
double-valued representation. 

The Dirac equation is the simplest equation which relates the two Pauli spinors, 
now taken as functions on spacetime. In another representation of the Dirac equa- 
tion, one collects the two Pauli spinors into a four-spinor, also called a Dirac spinor. 
Spinors are thus geometric objects on which the SU(2) elements act naturally, and 
which carry a representation of the rotations. 

We stress once again that this cannot be seen in the ad hoc Bohm—Pauli theory 
presented here. The vector character of the velocity (8.21) comes from the gradient, 
and the remaining spinor character is “traced out” by the scalar product. However, 
when one starts from the Dirac equation, one achieves the more fundamental view 
[12] we alluded to above. Then the velocity vector field can be constructed as a 
bilinear form from the Dirac spinors without invoking any gradient. A timelike vec- 
tor j# can be written as a sesquilinear product of spinors j4 = w'yy yw, where 
y", uw = 0,1,2,3, are the so-called y matrices, 4 x 4 matrices containing the Pauli 
matrices as blocks. 

One can also view the Dirac equation is the simplest equation which renders j 
divergence free, i.e., 0, j4 = 0, whence j" can be viewed as a current, obeying the 
continuity equation. The Bohm—Pauli theory emerges as a non-relativistic approxi- 
mation of the Bohm—Dirac theory. 

But now we must move on. What is of more interest right now is to see how 
the splitting of the wave function emerges from the Pauli equation (8.22). Consider 
a Stern—Gerlach magnet oriented in the z-direction. For simplicity, the inhomoge- 
neous magnetic field is assumed to be of the form B = (BG, y), By (x, y),bz) (such 
that V - B = 0), and the initial wave function is chosen to be 


a 


where s = (; 
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am = on(civolxs) la(4) +B (9)] (6.23) 


where (5) and (?) are eigenvectors of o, with eigenvalues!® 1,—1. (We have cho- 


sen the same function of position for both spinors merely for notational simplicity.) 
By virtue of the product structure of the function of position, the x,y evolution of 
the wave function separates from the z evolution, i.e., we obtain two Pauli equa- 
tions, one for the x, y evolution and one for the z evolution. This is a straightforward 
consequence, whose demonstration is left to the reader. 

We do not care about the x, y part, but focus instead on the z part of the evolution. 
We also assume that the velocity in the x-direction is rather large (the particle moves 
quickly through the magnetic field, which has only a small extension in the x, y- 
directions). So there is also a large inhomogeneity of the magnetic field in the x- 
direction, but since the particle is moving quickly, this should not matter much. In 
short, the x velocity essentially determines the amount of time T the particle spends 
in the magnetic field. 

Let us now focus on the z motion through the magnetic field. Due to the short- 
ness of the time 7, we shall also ignore the spreading of the wave function. This 
amounts to ignoring the Laplace term in the equation. Then by linearity, we need 
only concentrate on the parts 


The wave packet 
o(z) = | el f(k)dk (8.24) 


is assumed to be such that f(k) peaks around ky = 0. We have thus reduced the 
problem to an investigation of 


op) 


in (ast) =H (-1)"be@(q,t), n= 1,2. 


Hence 
(n) (yr Het | ol”) 
®t) = exp | —i(—1) = D’, 
and after leaving the magnetic field, so that they are once again in free space, the z 
waves of the wave packet (8.24) have wave numbers 


ne 


k=k-(-1) ; 


10 To avoid confusion, we talk about spin +1/2, but the number 1/2 and the dimensional factor are 
of no importance for our considerations and absorbed in the factor LU. 
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As solutions of the free wave equation, they must therefore have frequency 


+ Ak 
o(k)=—. 
(k) = 


The group velocity of this wave is then (observing that kg = 0) 


Jo Pe get . 
Ok m 
whence the wave packet @!) moves in the positive z-direction, i.e., the direction 
of the gradient of B (spin up), while ®) moves in the negative z-direction (spin 
down). However, the superposition of both wave packets remains with the weights 
defined by the initial wave function (8.23), and hence, in quantum equilibrium, the 
particle will move upwards with probability |a@|?. One then says that the particle 
has spin up with probability |a|?. And likewise, the particle moves downwards with 
probability |B|?, and one says that it has spin down with probability |B|?. Which 
spin the particle “ends up with” depends only on the initial position of the particle 
in the support of the initial wave function. That is were the “probability” comes 
from. 

We close with an observation that will be used later. The “expectation value” of 
the spin is 


1 
5 (le? -|BP) . 


This can be computed using the following scalar product, which will play an impor- 
tant role in later chapters: 


1 1 1 
(o|;0.0) = 5 | axayae 0.0 = 5 (\a/?—|BI*) . (8.25) 


This is easily verified. 

The a-spin 1/2 spinors are defined as eigenvectors of the matrix a-o /2, where the 
a-spin +1/2 spinor and the a-spin —1/2 spinor are orthogonal, and where a stands 
for some abitrary direction. If we choose 


as above for the z spinor, then an arbitrary normalized wave function will look like 


i =) 

Yo(x) 

Sending this wave function through the Stern—Gerlach apparatus, the z-spin +1/2 
will appear with probability 


fos|( 21). (9) = faxirev. 
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z-Spin +4 


Fig. 8.6 Combination of two Stern—Gerlach experiments. Curved lines represent the paths of the 
wave packets 


Finally, to crystallise our ideas on spin, let us consider a thought experiment 
devised by Tim Maudlin.'! Send a y-spin —1/2 particle in the z-direction through 
a Stern—Gerlach apparatus. Then the wave function splits into two wave packets of 
equal weights belonging to a z-spin + 1/2 particle and a z-spin —1/2 particle. Now 
let both wave packets pass through a Stern—Gerlach magnet which has opposite 
orientation (rotated through 180°), so that the two packets move towards each other, 
move through each other, and then hit a screen (see Fig. 8.6). If the particle hits the 
screen above (0) the symmetry axis S, then the particle must have z-spin —1/2, and 
if the particle hits the screen at u (below the axis), it must have z-spin +1/2. But 
the symmetry axis is still a topological barrier for the trajectories in a sufficiently 
symmetric setup. (As in the two slit experiment, we can even think of letting the 
wave go through a double slit.) This means that a particle that hits o was always 
above S, while a particle that hits u was always below S. What does this tell us? In 
fact it tells us that the particle changes its spin as a person changes his or her shirt. 
In other words, spin is not a property of a particle. It is dangerously misleading to 
speak of a particle with spin such and such. Spin is a property of the guiding wave 
function, which is a spinor. The particle itself has only one property: position. 

In this thought experiment, the particle “has z-spin +1/2 once and z-spin —1/2 
once”, meaning that the particle is once guided by a z-spin +1/2 spinor and once 
by a z-spin —1/2 spinor. What spin the particle “has”, i.e., in which packet the 
particle ends up, is decided by the initial position of the particle in the support of 
the y-spin initial wave function. In a totally symmetric setup (see Fig. 8.6), all initial 
positions in + will move so as to pass the first Stern—Gerlach apparatus upwards 
(z-spin + 1/2), while all initial positions in — will move downwards (z-spin —1/2). 

Let us push this idea a bit further. Returning to Fig. 8.5 and considering the lower 
outcome, imagine a series of Stern—Gerlach setups which split the wave packet fur- 
ther and further by having the magnets oriented as follows: z-direction, y-direction, 
z-direction, y-direction, for each outgoing wave packet. What we have then is ob- 
viously a quantum mechanical version of the Galton board. No matter how many 


'l Private communication. 
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magnets we put one after the other, the typical initial position will generate a typ- 
ical run through this quantum mechanical Galton board which can be extended in 
principle ad infinitum. !? 

As a final remark, the wave function for many particles is an element of the 
tensorial spin space.'* A typical N-particle wave function is a linear superposition 
of N-fold tensor products of one-particle spinor wave functions. 


8.5 A Topological View of Indistinguishable Particles 


In his talk about future physics entitled Our Idea of Matter,'4 Schrédinger specu- 
lates on what will remain of quantum mechanics, and whether a return to classical 
physics will be possible? Above all else, Schrédinger is firmly convinced that the 
return to discrete particles is impossible. By “discrete particles” he means particles 
in the sense we intend it in this book, that is, point particles which have a position. 
Now what is his reason for such a firm conviction? 

To understand this we must consider the Bose—Fermi alternative. In many- 
particle quantum mechanics, the indistinguishability of particles is encoded in the 
symmetry (boson) or antisymmetry (fermion) of the wave function under exchange 
of particle coordinates. For example, a two-particle wave function of indistinguish- 
able particles y(x,,x2) is either symmetric (bosonic) W(x; ,x2) = W(x2,x1), or anti- 
symmetric (fermionic) y(x),x2) = —W(x2,x,). According to Schrédinger, this can- 
not be if particles exist as discrete entities. Moreover many textbooks suggest that 
the quantum mechanical indistinguishability of particles forbids them from having 
positions. For if they had positions, then we could say that one particle is here and 
one particle is over there. And when they move, the one over here moves along 
here and the one over there moves along over there, so they are in fact perfectly 
distinguishable by the fact that one is here and one is over there. 

The moral seems to be that quantum mechanical indistinguishability of particles 
is something fantastic, something new or even revolutionary, which absolutely ne- 
cessitates the understanding that particles are not particles. But this is nonsense, and 
Schrédinger’s conviction was unfounded. He was completely wrong, and so is any 
statement about the impossibility of a particle reality. Indeed, quite the opposite is 
true. Because one has particles, one can easily understand the symmetry properties 
of the wave function. To put it succinctly, the Bose—Fermi alternative is a straight- 
forward prediction of Bohmian mechanics. 

We wish to give an idea why this is so. What does it mean to have indistinguish- 
able particles? One thing is clearly this: the labeling of particle positions, as we 


'2 Tt may be interesting to note that, according to Wigner [13], von Neumann drew his intuition that 
“deterministic hidden variables” in quantum mechanics are not possible from this Gedankenexper- 
iment. His intuition seemed to suggest that a “classical deterministic evolution” cannot produce an 
ideal Bernoulli sequence of “infinite” length. 


'3 See Chap. 12 for the definition of tensor space. 
14 Unsere Vorstellung von der Materie [14]. 
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did in this section when we wrote Q;,i= 1,... ,N, plays no physical role, 1.e., it is 
purely for mathematical convenience. The particles are not physically distinguished, 
So it is better not to put any labels at all. But do we not then lose our nice configura- 
tion space picture R*‘? The answer is that we do. The true configuration space of N 
indistinguishable particles is this. One has N positions in R*, and these N positions 
are N points of IR3, so that we have a subset of R* with N elements. Therefore the 
configuration space is 


2={qcR*:|g=N, ie., g= {aqu,...,qv}, qieR’}. 


That space looks quite natural and simple, but topologically it is not. It is a far richer 
space than R>", 

For example, take a wave function defined on this space. Then it is a function 
depending on sets. Since the set has elements, it is a function also of elements, but 
as such it is symmetric, since exchanging the order of elements in a set does not 
change the set. Hence if we insist on writing the function y(q) as a function of the 
elements q;, i= 1,...,N, of the set q (this is like introducing coordinates on 2), 
then y(q1,.-. ,qy) will be symmetric under any permutation o of the N indices: 


W(Go(1))-+- Go(v)) = W(Gi,--- dN) - 


This is pretty straightforward and implies that indistinguishable particles are bosons, 
i.e., they are guided by symmetric wave functions. 

Interestingly, this is only half the truth. Since the discovery of Pauli’s princi- 
ple, usually phrased as saying that two or more electrons cannot occupy the same 
quantum state, together with the relativistic Dirac equation for electrons,!° physi- 
cists have known that electrons are guided by antisymmetric wave functions. In the 
scalar case, these are wave functions for which 


W(do(1)>--- »Gow)) = Sign(o) W(qi,--- , dw) , 


where sign(o) is the signature of the permutation, i.e., —1 if o decomposes into 
an odd number of transpositions, and +1 if it decomposes into an even number of 
transpositions. 


'S Dirac’s equation builds heuristically on a square root of the negative Laplacian, and contains 
a continuum of negative as well as positive energy states. To rule out unstable physics, Dirac in- 
vented the Dirac sea, which ensures that all negative energy states are occupied by particles. By 
Pauli’s principle, no more than two particles can occupy one state, therefore particles of negative 
energy cannot radiate energy and thereby acquire an even more negative energy, because all nega- 
tive energy states are already occupied. If that were not so, Dirac’s equation would yield nonsensi- 
cal physics, because all electrons would radiate endlessly, acquiring ever more negative energies. 
Dirac’s sea is a way out, and it requires that electrons be described by antisymmetric wave func- 
tions (which is the mathematical formulation of Pauli’s principle). Dirac’s sea is reformulated in 
modern quantum field theory by using the notion of vacuum together with creation and annihilation 
operators which satisfy so-called anticommutation relations, and which describe the net balance of 
particles missing in the sea and particles “above” the sea. 
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But how can one understand that there are also antisymmetric wave functions? In 
fact, there is a principle that can be supported in the context of quantum field theory 
which says that half-integer-valued spinor representations of the rotation group for 
many particles must be antisymmetric wave functions, while integer-valued repre- 
sentations of the rotation group are to be described by symmetric wave functions. In 
the former case, the wave functions are said to be fermionic, and the particles guided 
by such wave functions are called fermions. In the latter case, they are described as 
bosonic, and the guided particles are called bosons. This principle is commonly 
referred to as the spin—statistics theorem. 

However, we cannot go that far here, since a complete analysis would have to 
be relativistic. What we can do, however, is to explain a little further just where the 
antisymmetric wave functions are hiding when we look at things from a nonrela- 
tivistic standpoint (for an elaboration of topological effects in Bohmian mechanics, 
see [15]). To this end we observe a crucial topological characteristic of the configu- 
ration space 2. This space can be expressed as 


2 = (R™\A™)/Sy =: RY /Sy . 
The right-hand side means that we delete from R* the diagonal set 


ABN — {(ai.-.. qv) €R*’|qi=4q; for at least one i# j} ; 


to yield the set we have denoted by RY := R?"\A3%, and than identify all those 


N-tuples in RY which are permutations of one another. The latter construction of 
equivalence classes is expressed as factorization by the permutation group Sy of N 
objects. 

To understand what this factorization can do, let us consider a much simpler 
manifold, namely a circle, which arises from [0,1] by identifying the points 0 and 
1, or if one so wishes, from R by identifying the integers n € Z C R with zero. In 
terms of factorization, this is expressed by R/~, where x ~ y if and only if one has 
x—y €Z. Ris topologically as simple as anything could be, but the circle is not. It 
contains closed curves winding around the circle, which cannot be homotopically 
deformed to a point. All closed curves on the circle fall into different equivalence 
classes (indexed by the winding number of the closed curve, which can be positive 
as well as negative). A class consists of all curves which can be homotopically 
deformed into one another. Closed curves with different winding numbers belong 
to different classes (a curve which winds around the circle 3 times clockwise and 
once counterclockwise belongs to class —2) and the set of classes can be made 
(with a rather straightforward definition of multiplication by joining curves) into a 
group called the fundamental group IT of the manifold. In the example of the circle, 
IT=Z. 

So the first lesson to learn from factorizing a topologically simple manifold (e.g., 
a simply connected manifold, which means that all closed curves can be deformed 
homotopically to a point) by an equivalence relation is that the resulting manifold 
will not in general be topologically simple (e.g., it may become multiply connected). 
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By analogy, RY / Sn Will be multiply connected. The manifold will have “holes”, so 
that classes of closed curves exist. We now switch to a different viewpoint. We wish 
to define a Bohmian vector field on such a strange manifold. The Bohmian vector 
field is the gradient of the phase of the wave function, and the phase is therefore 
the anti-derivative of the vector field. This parallels the integration of a vector field 
along a curve in the punctured plane, e.g., 1/z integrated over a circle around zero 
in the complex plane C \ {0}. To construct the global anti-derivative, namely Inz, 
Riemann sheets were invented. Quite analogously, one invents a covering of the 
circle by a winding staircase, so that the covering is R twisted into a spiral. 

For the manifold 2 = RW / Sn, the analogous construction leads to the univer- 


sal covering space 2 = R3N, which is a simply connected manifold covering the 
multiply connected manifold in the following sense. The covering space is mapped 
by a local diffeomorphism 7 : 2 — Ito the basis manifold, where the map is also 
referred to as a projection. This map is a coordinate map. What is local about it? 
Well, if one moves one flight up the staircase, one arrives at different (permuted) 
coordinates for the same point of the basis manifold. To any open neighborhood 
U Cc Q there corresponds a set of coordinate neighborhoods (leaves) 7~'(U). All 
these different coordinates make up a covering fiber 2~!(q), which is the set of all 
points in 9 = RW which are projected down to q. If g= {qu,... , qn}, then 


m '(q) = {dots--: :Go(n)) OE sy} : 


Now there is another group, the group of covering transformations. A covering 
transformation is an isomorphism which maps the covering space to itself, while 
preserving the covering fibers. The group of covering transformations is denoted 
by Cov(2, 2). For two elements g and ? in the same covering fiber, there is one 
element X € Cov( 2 , 2) such that G = LF. Since the fiber consists of permuted N- 
tuples, it is clear that Cov( 2 , 2) is isomorphic to the permutation group Sy. It can 
be shown (and this may not come as a surprise) that the fundamental group IT(2) 
is isomorphic to Cov( 2, 2), whence the fundamental group is isomorphic to Sy. 

We have now a good grasp of the connectedness of the true configuration space 
2, on which we wish to define a Bohmian vector field. We can thus look for wave 
functions ~ defined on the covering space. However, we must require such wave 
functions to behave properly under the projection along the fibers. Proper behavior 
means that the wave function must obey a certain periodicity condition. This is plain 
to see, otherwise the wave function cannot define a vector field on the configuration 
space. To begin with, f defines a Bohmian vector field oY” on Q in the usual way, 
and that vector field will have to be projected down to the configuration space. A 
vector field # on J is projectable!® if and only if 


a A 


m(q@)=m(F) => (9) =9(F). 


'6 The projection 7 generates a push forward 7* on vector fields which does the job of projecting 
the vector field to the manifold, but we do not want to formalize this any further. 
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The periodicity condition for the wave function is as follows. For XY € Cov( 2, 2), 
(2G) = Vs H(4) ; 


where ys € C \ {0}. But this implies that wy must be a an element of the representa- 
tion space of the group of covering transformations represented by C \ {0}, since 


W( 220214) = Y=, ¥5, W(4) - 


When we require that|ys|* = 1, i-e., |ys| = 1, we can project the equivariant evolu- 
tion of the Bohmian trajectories on the covering space to the motion on the config- 
uration space, where the probability density 


HZ? = lvl? W)P = l(a)? 
projects to a function |y(q)|* on 2. We thus obtain a unitary group representation, 
also called the character representation of the group. Translating this into coordinate 
language, we have 


W(2q) = W(Go(1);- sia ,Go(n)) = Yo, - a8 , qv) ’ (8.26) 


where we put the element from the group of covering transformations in correspon- 
dence with a permutation. 

This is a formula that can be found in standard textbooks (perhaps without the 
hats), and we could have written it down immediately at the beginning of this sec- 
tion. Since we started the chapter with labeled particles and labeling is unphysical, 
we could have concluded that, because the | y|?-distribution must not change under 
permutation of the arguments, the wave function should at best change by a phase 
factor when labels are permuted. But then we would have missed the truth behind 
indistinguishability, which lies in the topology of the configuration space. 

But let us end the argument as briefly as possible. Consider (8.26) for o a trans- 
position t. We have To T = id, and hence ve = 1, so that y, = +1. So there we 
are. There exist two character representations of the permutation group, one with 
Yc = 1, which takes us back to the bosonic wave functions, and one with y; = —1, 
which leads us to the fermionic wave functions. And moreover, we found that this 
is all there is. But as we pointed out earlier, this is not all there is to say. The whole 
construction should be done for spinor-valued wave functions. This is a bit more 
involved, but the principle is the same. On the basis of our present knowledge, the 
spin-statistics theorem seems not to be purely grounded on topology, and we shall 
not say more on this issue. 

Here is a final remark. Let us return to RY := (IR*% \ A3Y). Since the diagonal 
is taken out, this will produce holes in the space, and one may wonder whether this 
now allows for closed curves around those holes that cannot be deformed to zero 
curves. But the codimension of the diagonal set is the dimension of the physical 
space d. Consider therefore the following analogy. In a space of d = 3 dimensions, 
a set of codimension 3 is a point, and removing a point from R? does not produce 
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anything interesting, because all closed curves can still be deformed while avoiding 
the point. In d = 2, the diagonal has codimension two. That corresponds to a line in 
IR3, and removing a line makes the space multiply connected! 

The moral is that the configuration space of indistinguishable particles in two 
dimensions is again multiply connected, and its fundamental group is no longer 
isomorphic to the permutation group, but rather to a group which takes into account 
the extra obstacles arising from taking away the diagonal. This group is called the 
braid group. It allows for the continuum of characters of unit length, so any phase 
factor is allowed. The wave functions have therefore been called anyons. 


References 


1. J.S. Bell, Speakable and Unspeakable in Quantum Mechanics (Cambridge University Press, 
Cambridge, 1987) 
D. Diirr, S. Goldstein, R. Tumulka, N. Zanghi: Phys. Rev. Lett. 93 (9), 090402, 4 (2004) 
D. Diirr, S. Goldstein, R. Tumulka, N. Zanghi: J. Phys. A 38 (4), R1 (2005) 
E. Deotto, G.C. Ghirardi: Found. Phys. 28 (1), 1 (1998) 
D. Bohm: Physical Review 85, 166 (1952) 
D. Bohm, B.J. Hiley: The Undivided Universe (Routledge, London, 1995). An ontological 
interpretation of quantum theory 
7. P.R. Holland: The Quantum Theory of Motion (Cambridge University Press, Cambridge, 1995) 
8. S. Goldstein, W. Struyve: J. Stat. Phys. 128 (5), 1197 (2007) 
9. K. Berndl, D. Diirr, S. Goldstein, G. Peruzzi, N. Zanghi: Commun. Math. Phys. 173 (3), 647 
(1995) 
10. S. Teufel, R. Tumulka: Commun. Math. Phys. 258 (2), 349 (2005) 
11. A. Tonomura, J. Endo, T. Matsuda, T. Kawasaki, H. Ezawa: American Journal of Physics 57, 
117 (1989) 
12. D. Diirr, S. Goldstein, K. Miinch-Berndl, N. Zanghi: Physical Review A 60, 2729 (1999) 
13. E. Wigner: American Journal of Physics 38, 1005 (1970) 
14. E. Schrédinger: Unsere Vorstellung von der Materie. CD-ROM: Was ist Materie?, Original 
Tone Recording, 1952 (Suppose Verlag, Koln, 2007) 
15. D. Diirr, S. Goldstein, J. Taylor, R. Tumulka, N. Zanghi: Ann. Henri Poincaré 7 (4), 791 (2006) 


de 


Mathematical PhyAics 


Chapter 9 
The Macroscopic World 


We need to cope with an unwanted heritage, namely the idea that physics is about 
measurements and nothing else. Bohmian mechanics is obviously not about mea- 
surements. However, since quantum mechanics is about measurements and at the 
same time plagued with the measurement problem, and since Bohmian mechanics 
is supposed to be a correct quantum mechanical description of nature, one may want 
to understand how Bohmian mechanics resolves the measurement problem. Mea- 
surements, the reading of apparatus states, belong to the non-quantum, i.e., classical 
world. Therefore the more general question, and a question of great interest on its 
own is this: How does the classical world arise from Bohmian mechanics? Con- 
cerning the latter we shall never, except for now, talk about # — 0. While this is 
a mathematically sensible limit procedure, it is physically meaningless because h 
is a constant that is not equal to zero. The physically well-posed question here is: 
Under what circumstances do Bohmian systems behave approximately like classical 
systems? 


9.1 Pointer Positions 


Since Bohmian mechanics is not about measurements but about ontology, namely 
particles, it has no measurement problem. However, since quantum mechanics does 
have the measurement problem, one may wonder whether Bohmian mechanics 
might not have another problem, namely that it is not a correct description of nature. 
As we shall argue, Bohmian mechanics is a correct description. To show this it may 
be quite helpful to phrase the measurement process in Bohmian terms. 

First note that the very term “measurement process” suggests that we are concer- 
ned with a physical process during which a measurement takes place. But this in turn 
suggests that something is measured, and we are compelled to ask what this quantity 
is? In Bohmian mechanics two things spring to mind as quantities that could be mea- 
sured: particle position and wave function. Apart from having a position, particles 
have no further properties. The wave function on the other hand does two things. 
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physical space configuration space 


2 1 SUPPDo 


4 
supp®5 


Fig. 9.1 Pointers in physical space and the supports of the pointer wave function in configuration 
space 


> 


Firstly, it guides the particle’s motion, and secondly, it determines the statistical dis- 
tribution of its position. The latter fact already suggests that at least the statistical 
distribution should be experimentally accessible, i.e., the particle’s position should 
be measurable, for example by putting a photographic plate behind a double slit. 

Even more famous than position measurement is “momentum measurement’, 
since the Heisenberg uncertainty relation is about momentum and position mea- 
surements. But momentum is not a fundamental notion in Bohmian mechanics. On 
the other hand, particles do have velocities, so can they be measured? We need to 
say something about that, too. At the end of the day the moral to be drawn is sim- 
ply this message from Bohr: in most cases a measurement is an experiment where 
nothing has been measured (in the sense the world is normally understood), but the 
experiment ends up with a classical pointer pointing to some value on a scale of 
values. In Bohmian terms one may put the situation succinctly by saying that most 
of what can be measured is not real and most of what is real cannot be measured, 
position being the exception. 

So let us move on to the measurement problem. Consider a typical measurement 
experiment, i.e., an experiment in which the system wave function gets correlated 
with a pointer wave function. The latter is a macroscopic wave function, which 
can be imagined as a “random” superposition of macroscopically many (~ 107°) 
one-particle wave functions, with support! tightly concentrated around a region in 
configuration space (of ~ 107° particles) that makes up a pointer in physical space 
pointing in some direction, i.e., defining some pointer position. So different pointer 
positions belong to macroscopically disjoint wave functions, that is, wave functions 
whose supports are macroscopically separated in configuration space (see Fig. 9.1). 


' The support of a function is the domain on which it is not equal to zero. The notions of support, 
separation of supports, and disjointness of supports have to be taken with a grain of salt. The sup- 
port of a Schrédinger wave function is typically unbounded and consists of (nearly) the whole of 
configuration space. “Zero” has thus to be replaced by “appropriately small” (in the sense that the 
square norm over the region in question is negligible). Then, the precise requirement of macro- 
scopic disjointness is that the overlap of the wave functions is extremely small in the square norm 
over any macroscopic region. 
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The coordinates of the system particles will be denoted by X in some m- 
dimensional configuration space, and for simplicity we assume that the system 
wave function is a superposition of the two wave functions yi (x) and y(x) alone. 
The system interacts with an apparatus with particle coordinates Y in some n- 
dimensional configuration space. Possible wave functions of the apparatus will be 
a pointer “null” wave function ®o(y) (Y € supp ®p = 0) and two additional pointer 
position wave functions ®;(y) (Y € supp ®; = 1) and ®(y) (Y € supp @) = 2). 
We call an experiment a measurement experiment whenever the interaction and the 
corresponding Schrédinger evolution of the coupled system is constructed in such a 
way that 


Wiy '—F Wi®, i=1,2, (9.1) 


where T is the duration of the experiment and the arrow stands for the Schrodinger 
evolution. This means that the pointer positions correlate perfectly with the system 
wave functions.” We shall say that the pointer points to the outcomes | (resp. 2) if 
YW (resp. W2) is the system wave function. 

Concerning the initial wave function of the system and apparatus, we can make 
the following remark. Starting the measurement experiment with a product wave 
function ¥(q) = (x,y) = wi(x)®o(y) expresses the fact that the system and appa- 
ratus are initially independent physical entities. The Schrodinger equation (8.4) for 
a product wave function separates into two independent Schrédinger equations, one 
for each factor, if the interaction potential V(x,y) ~ 0. A warning is in order here: 
V(x,y) ~ 0 on its own does not imply that the x and y parts develop independently, 
because the wave function (x,y) need not be a product. To have physical indepen- 
dence, we also require the velocity field v* of the X system to be a function of x 
alone. In view of (8.1), we observe that 


VY 
a = Vin¥ , 


so that 


Vin¥(x,y) = Vin(wi(x) p(y) 
= Viny;(x) + Vin ®)(y) 


Vx In yj(x) 
VyIn@o(y) } 
Hence the particle coordinates X are indeed guided by the system wave function yj; 


if the combined system is guided by a product wave function. The arrow in (9.1) 
stands for a time evolution where the interaction is not zero, i.e., V(x, y) 4 0. How- 


? The wave function at the end of the experiment need not have the idealized product structure 
w;@;. It can be replaced by an entangled wave function ¥%4(x,y) without changing the following 
arguments. 
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ever, in this measurement experiment, these particular initial wave functions guaran- 
tee that, at the end of the experiment, we once again obtain a product wave function. 

The measurement problem (see also Remark 9.1) is a trivial mathematical con- 
sequence of (9.1) and the linearity of the Schrédinger evolution. It comes about 
whenever the system wave function is a nontrivial superposition 


v(x) = o1wi(x)+02y2(x), Jar? +[am|? = 1. (9.2) 


Then, by virtue of the linearity of the Schrédinger equation, (9.1) yields 


WO) = Y ayi® '— Y awe, (9.3) 
i=1,2 i=1,2 


which is a macroscopic superposition of pointer wave functions. If one has nothing 
but wave functions, this is a bad thing, because such a macroscopic superposition 
has no counterpart in the macroscopic, i.e., classical world. What could this be? 
A pointer pointing simultaneously to 1 and 2? Did the apparatus become a mushy 
marshmallow? 

In Bohmian mechanics on the other hand the pointer is there and it points at 
something definite. In Bohmian mechanics we have the evolution of the real state of 
affairs, namely the evolution of the coordinates given by (8.1). And by virtue of the 
quantum equilibrium hypothesis, we do not even need to worry about the detailed 
trajectories. If we are interested in the pointer position after the measurement exper- 
iment, i.e., in the actual configuration of the pointer particles Y(T) (see Fig. 9.2), 
we need only observe the following. Given the wave function on the right-hand side 


Ya 
we Xp, 7) 
supp ®4 + 7 
/_ _- Bohmian trajectory 
supp Po 7 
supp ®, r 
“suppy x 


Fig. 9.2 Evolution of the system—pointer configuration in the measurement experiment. The Y- 
axis (the pointer axis) represents the configuration space of macroscopic dimensions, while the 
X-axis (the system axis) may be thought of as low-dimensional. Depending on the initial values 
(Xo,¥o), at time 7, the Bohmian trajectory ends up either in the upper or the lower of the two 
macroscopically distant subsets that the support is split into in the system—pointer configuration 
space 
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of (9.3), by the quantum equilibrium hypothesis, the probability that the pointer 
“points to k’”’, i.e., the probability that its configuration is in the support of ®,, is 
given by 


PY (Y(T) =k) = P¥ (Y(T) € supp %) 


2 
a Wi(x) Di(y)| d"xd"y 


- _— ;} », 


=> / onl? |yi(x) | Pi(y)|Pal"xay 
i=1,27 {(%y) |yesupp Bx } 


+2] [ ccd s(x) 5 (x) ® (y)P5(y) axa" 
{(x,y)lyesupp B;} 
= |a|?. 


Note that supp ®; M supp ®2 ~ O, and hence for i # k, 


2 any nw 
[_, \e@reayso. 
supp ®; 
and likewise for the mixed terms 
J, Mertyiey = 0. 

supp Py 
Therefore the pointer points to position | with probability |a;|? and to position 2 
with probability |@|?. Which pointer position results depends solely on the initial 
coordinates (Xo, Yo) of the particles, as depicted in Fig. 9.2. Note that, while not 
necessary for measurement experiments, but possibly in line with one’s intuition, 
it may often be the case that the pointer’s final position is determined essentially 
by the initial positions Xo of the system particles alone and not by the coordinates 
Yo of the pointer particles. This is the case in a “spin measurement” using a Stern— 


Gerlach apparatus (where the measurement is simply the detection of already split 
trajectories). 


Remark 9.1. The Song and Dance about the Measurement Problem 
Without Bohmian mechanics the result (9.3) has no physical meaning, unless one 
interprets the wave function as an instrument for computing the probabilities that 
this or that pointer position results. But this then means that there is a pointer, and 
this might as well be the Bohmian one, whence nothing more needs to be said. 
However, physicists have been convinced that the innovation of quantum mechanics 
is something like this: the macroscopic world is real, but it cannot be described by 
microscopic constituents governed by a physical law. On the other hand the pointer 
moves from 0 to | or 2, just as Schrddinger’s cat either dies or stays alive, so some 
movement is going on. Why should that not be describable? 

It seems therefore an inevitable consequence that one must deny altogether that 
there is anything real besides the wave function. This is a clean attitude, and one ar- 
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rives at the clearest form of the measurement problem, because (9.3) is not what one 
sees. To overcome this defect the notion of “observer” was introduced. The observer 
makes everything alright, because it/he/she collapses the pointer wave function to a 
definite outcome. The observer brings things into being, so to speak, by observing, 
where “observing” is just a fancy way of saying “looking”. As if the apparatus alone 
could not point to anything in particular, but rather — well what? As if the cat could 
not know whether it was alive, its brain being in a grotesque state of confusion, and 
it therefore had to wait to become definitely alive or dead until somebody looked at 
it. 

But what qualifies an object to be an observer? Is a cat not an observer? Must the 
moon be observed in order to be there? Can only a subject with a PhD in physics 
act as an observer, as John Bell and Richard Feynman liked to ask with suitable 
irony? The observer is macroscopic alright, but how big is macroscopic? One must 
admit that these are all funny questions, but they have been earnestly discussed by 
physicists. 

But let us get back to reason. An observer is simply another albeit huge pointer! 
This means that, when the observer looks, i.e., when she/he interacts with the system 
(the apparatus) in question, the disjointness of the wave function supports (now sub- 
sets of the system—apparatus—observer configuration space) becomes even greater, 
and we have the same line of argument again: the huge wave function (including 
the observer) splits into a macroscopic superposition and the measurement problem 
remains. 

So now what? Well, nothing. Some say that decoherence comes to the rescue. 
When applied to the measurement problem, decoherence theory (which we dis- 
cuss a little further below, in the context of effective collapse) attempts to explain 
or prove or argue the obvious: that it is impossible for all practical purposes to 
bring such macroscopically disjoint macroscopic wave functions into interference. 
In other words, for all practical purposes — fapp, to use John Bell’s abbreviation — it 
is essentially impossible to bring the macroscopically disjoint supports of the wave 
functions together so that there is any significant overlap. Just as it is essentially 
impossible to have the wave functions of the dead and alive cat interfere. Just as it is 
essentially impossible for a gas which has left a bottle and filled a box to go back into 
the bottle. Of course, in principle it can be done, because one just has to reverse all 
the velocities at the same time and take care that there is no uncontrolled interaction 
with the walls of the box. So interference of macroscopically separated macroscopic 
wave packets is fapp impossible. Wave functions which consist of macroscopically 
separated macroscopic wave packets are fapp indistinguishable from statistical mix- 
tures (which we also discuss further below) of two non-entangled wave functions. 
This is true. This is why Schrédinger needed to add in his cat example that there was 
a difference between a shaky or out-of-focus photograph and a snapshot of clouds 
and fog banks.* 

Besides Bohmian mechanics, the only other serious approach to resolving the 
measurement problem is to make the evolution (9.3) theoretically impossible. This 


3 Bs ist ein Unterschied zwischen einer verwackelten oder unscharf eingestellten Photographie und 
einer Aufnahme von Wolken und Nebelschwaden. 
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means that the /inear Schrodinger evolution is not valid. It has to be replaced by one 
that does not allow macroscopic superpositions, i.e., wave packet reduction becomes 
part of the theoretical description. This is known as the GRW theory or dynamical 
reduction model, and good accounts can be found in [1-4]. | 


Remark 9.2. The Wave Function Is Not Measurable 
One is sometimes told that the virtue of quantum mechanics is that it only talks 
about quantities that can be measured, and since by Heisenberg’s uncertainty rela- 
tion the momentum and position of a particle cannot be measured simultaneously, 
one must not introduce particles and their positions into the theory. We shall discuss 
this nonsense in more detail later, but first let us take the opportunity to point out a 
simple consequence of (9.3), namely that the wave function cannot be measured. 
Here is the argument. Imagine a piece of apparatus which measures wave func- 
tions. This means that, for every wave function y a system can have, there is a 
pointer position such that, in view of (9.1), one has y®p — yOy, ie., y is pointed 
at if y is the wave function the system happens to be in. Now such an apparatus does 
not exist. This is actually what (9.3) tells us! For suppose such an apparatus did point 
out either y; or y2. Then if the system wave function is the superposition (9.2), the 
apparatus wave function must evolve by virtue of mathematical logic into (9.3), i.e., 
into a superposition of pointer positions, and not to (a Wy + Q Wr) Da, Wi ton): 
So it is easy to find a quantum mechanical quantity that cannot be measured. Fortu- 
nately, one has Bohmian mechanics and the particle positions, because they can be 
measured. And because of that, we can sometimes find out what the wave function 
is, using the Bohmian positions and the theory, of course. a 


9.2 Effective Collapse 


Now let us return to (9.3) and more important issues. If the pointer points to I, 1.e., 
if 


Y(T) € supp ®, , (9.4) 


we can fapp forget the wave packet y2@ in the further physical description of the 
world. In view of (8.1), and because supp v®! \ supp v®™ = 0, the velocity field on 
the combined configuration space for the system and apparatus induced by the wave 
function on the right-hand side of (9.3) becomes (the as do not matter) 


WI 
V. 
yi ® — ( 4 for y € supp Q , 
yi Pit W2P2 — (9.5) 


W 
yo) ( = for y € supp @) , 
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so that only yw, ®; “guides” if (9.4) holds, i.e., if the pointer position is 1. Fapp 
YW2@> will not become effective again, as that would require yw; @®; and yo. to 
interfere, either by controlling 107° “random phases”, an impossible task, or by 
Poincaré recurrence (see Sect. 9.5.2), which would take a ridiculously long lapse of 
time. Hence, as far as the further evolution of our world is concerned, we can forget 
about YW», i.e., Bohmian mechanics provides us with a collapsed wave function. 
If Y(T) € supp ®,, the effective wave function is y;®, and that means, by the 
argument (9.2), that the effective wave function of the system is yy. Put another 
way, we might say that, due to the measurement process y collapses to Wi. 

This collapse is not a physical process, but an act of convenience. It is introduced 
because it would simply be uneconomical to keep the ineffective wave functions, 
and the price we pay for forgetting them amounts to nothing. In just the same way, 
we can safely forget the fact that the velocities of gas molecules that have escaped 
from a bottle into a (much larger) container are such that reversing them all at the 
same time would result in the gas rushing back into the bottle. The gas molecules 
will interact with the walls of the box (exchanging heat for example) and the walls of 
the box will further interact with the outside world. In the end, reversal of velocities 
will no longer be sufficient to get the gas back into the bottle, as the information that 
the gas was once in the bottle is now spread all over the place. 

We have exactly the same effect in quantum mechanics, where it is called deco- 
herence. The separation of waves happens more or less all the time, and more or 
less everywhere, because measurement experiments that measure position are ubiq- 
uitous. The pointer (as synonymous with the apparatus) looks at where the particle 
is, the air molecules in the lab look at where the pointer is (by bouncing off the 
pointer), the light passing through the lab looks at how the air molecules bounced 
off the pointer, and so it goes on. 

Here is an example along these lines. A particle is initially in a superposition 
of two spatially separated wave packets W(x) + Wr(x), where Wy moves to the left 
and Wr moves to the right (see Fig. 9.3). To the right there is a photographic plate. If 
the particle hits the plate, it will blacken it at its point of arrival. So the photographic 
plate is a piece of apparatus and in this case the “pointer points” either at a black 
spot or nothing. If the particle is guided by wr, then the plate will eventually feature 
a black spot. If no black spot shows up, then the particle travels with y_. As in (9.3), 
Schrédinger evolution thus leads to 


t—>T 


[wi (x) alr V(x) | P(y) ——* W(x, T)®(y) + Og (yx, T) ’ (9.6) 


where ®g stands for the blackened plate. 

Can it happen that the position X is in the support of yw. and the plate nevertheless 
shows a black spot? In fact it cannot, because the supports of ®(y) and ®g (y,x,T) 
are disjoint. Why is this? Because the black spot arises from a macroscopic chemical 
reaction at the particle’s point of arrival (macroscopic because the black spot can 
be seen with the naked eye). Bohmian dynamics thus precludes the possibility of 
simultaneously having X € supp yW.(T) and Y € supp ®p(T) (see Fig. 9.4). 
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Fig. 9.3 The measurement process. Where is the particle? 


YA 
supp @(y.x.T) 
supp ee, 
> 
supp (T) supp, supp Y, . 


Fig. 9.4 Measurement of a particle’s position, depicted in the configuration space of the photo- 
graphic plate and particle 


In the foregoing we described a position measurement with the help of a photo- 
graphic plate. Is there anything special about the plate? Of course, there is nothing 
at all special about it! We could just as well have used light waves that scatter off 
the particle and measured its position in that way. Light rays scattered off a parti- 
cle are different from undisturbed light rays. So just like a photographic plate, light 
rays produce decoherence. The particle wave function gets entangled with the wave 
function of the photons in the light ray. So once again we can effectively collapse the 
particle wave function to where the particle is. After all, since we wish to describe 
the world as we experience it, it is relevant to us where the particle is. So it is now 
the light waves that act as pointer (see Fig. 9.5). 

But the procedure does not stop there. The light interacts with other things and 
so changes configurations in the environment (wherever the light goes, there will 
be changes). So the entanglement, the fapp impossibility of interference, continues 
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PSSSF53F 


Fig. 9.5 Light waves as pointer 


to grow. This is decoherence at work. But decoherence alone does not create facts. 
The facts are already there in the form of the Bohmian positions! Only the fapp 
description of that real state of affairs is a collapsed one. Schrédinger once thought 
that a cat was a big enough pointer to get that point across, and that some description 
of the real state of affairs was needed for a physical description. 

The effective collapse becomes even more stable once the results of a measure- 
ment have been recorded. Interference of what was once the particle wave function 
now means that all the records must also be brought under control and made to in- 
terfere. That does seem hopeless. The collapse is stable for all times relevant to us. 
The moral is that interactions typically destroy coherence, since they “measure” the 
particle positions. Fapp collapse of wave functions is the rule, and that is essential 
for understanding how the classical macroscopic world emerges. 

In Remark 9.1, we mentioned an alternative approach for solving the measure- 
ment problem, which involved introducing the collapse as a fundamental physical 
event. Such a fundamentally random theory, in which a random collapse is part of 
the theoretical description, will ensure that the collapse is effective only for large 
systems. How is it possible to distinguish experimentally between Bohmian me- 
chanics and a collapse theory? The answer is, by experimentally achieving a macro- 
scopic superposition, i.e., by forcing the wave functions of a dead and alive cat to 
interfere. But perhaps we had better not use a cat. Some less cruel and smaller sized 
experiment would do. However, one should realize the problematic nature of such 
experiments. For how can one hope to control the effects of decoherence? Without 
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control over the ubiquitous decoherence, it does not prove much, if one cannot see 
interference of mesoscopic/macroscopic systems. On the other hand, if one could 
experimentally achieve interference, well then everything depends on whether the 
interfering system is big enough to be sure that it will be collapsed according to the 
collapse theory, which of course contains parameters that can be adjusted to cover 
quite a wide range. It will be a long time before we can hope for experimental help 
on deciding which of these theories, Bohmian mechanics or spontaneous localiza- 
tion, is physically correct. 


9.3 Centered Wave packets 


Let us follow the particle in Fig. 9.5 for a while. Suppose the particle is alone in the 


coho 99 


universe, “in” a wave packet of the form (6.15), viz., 
(x,t) = [olkx Of, (k)d°k ; 


with @(k) = hk*/2m. This is the solution of the one-particle Schrédinger equation 
(8.4) with V = 0. The wave moves with group velocity (0@/0k)(ko) = fko/m, and 
according to the stationary phase argument, the width of the k distribution given 
by the exponential function is proportional to 1/t [see (6.18)]. The wave there- 
fore spreads linearly with rt. This spreading is a generic wave phenomenon for the 
Schrédinger evolution that results from the special dispersion relation. However, 
according to (6.18) (with y = #/2m), the larger the mass, the smaller the spreading 
rate, i.e., the wave function of a very massive particle stays localized (around its 
center) for quite a long time. 

Now consider a wave packet over such a period of time that spreading can be ne- 
glected. Let us follow the evolution of the position X of a Bohmian particle. Due to 
quantum equilibrium, X(t) moves with the packet. But how does the packet move? 
According to Schrédinger’s equation, of course: 


2 
in (x1) =" ayix.)+V@)vext)=Hylxt), 9.7) 
Ot 2m 


with 
h 
H = ——A+V(x) (9.8) 
2m 
as Hamilton operator. For X(x,t) with initial value x, we get 


X(x,) =" TH (K(x). . 


Mathematical Physics 


184 9 The Macroscopic World 
Since the packet is well localized in position, instead of X(x,t), we consider its 
expectation value oY (X(t). According to the quantum equilibrium hypothesis and 
equivariance, this is given by 


(X)¥(t) := E¥(X(2) 
= / X(x,1)|w(x,0)|2d3x 


a / x|w(x,t)/d3x. (9.9) 


For ease of notation we shall omit the superscript y from now on. Using (7.17), we 
obtain 


Sx) = fx Sly Pars 
= — [x¥-i(x,1)a°s 
= i i(x,)x (9.10) 
= (v¥(X(1),1)) , 

where we have used partial integration and set the boundary terms to zero. Intu- 


itively, a Schrddinger wave that does not spread should move classically. Therefore 
we consider d?(X) /dr*. For this we need 0j/dt. Using (7.12), we get 


ee ee . 
pA ee 


and with (7.11) and (9.7), 


Sia (Hy) Vy - wV(Ay) + (AW)Vy"— yV(Ay) 


Further, by partial integration, we have for wave packets y and @ 
/ wi Hod’x = / (Hy )pd’x, 


a property which we shall later call symmetry (see Chap. 14). So 


Mathematical Physics 


9.3 Centered Wave packets 185 
d? 0. 
S(t) = f Sites 
1 [ 2 2 sk 2 
= ; | WAV — WV (Hy) + WV — WV (Hy )] ax 


1 = 
a / LyeVVy—weV(Vy) + VV — yV(Vy")| dx [by (9.8)] 


_ 1 [ Re * | 43 
= >| | — (WV) wy — (WV) wry | dix 


= = (-W(X))(0). 


Hence we arrive at Newtonian equations in the mean, so to speak, a version of the 
Ehrenfest theorem: 


m(X) = (—VV(X(t))) , (9.11) 


and we would have the classical limit and the final identification of the parameter m 
with Newtonian mass if 


(VV (X(¢))) = WV ((X(t))) . (9.12) 
For this, however, we would have 
Var (X) = ((x- (x))?) ~0, 


which means that y(x,t) would be a very well localized wave. 
Let us make that a little more precise. Expand V up to third order around (X): 


V(X) = V((K)) + (K(X) VV((K)) +5 [ (K— )-V] VOX) 
+ [(x- (%))-V] VOX) (9.13) 
So the expectation value of VV (X) is given by 
(PVE) = YVR) +3 ( [(K—00)-¥]") WHER). 


We thus establish as a rule of thumb that, in order to have classicality of the motion 
in a potential, the width of the wave function should obey 


y’ 
Var(X) < 4/ ym: (9.14) 


So far so good. But now we have to admit that the spreading will eventually become 
effective. But the particle is not alone in the universe. It will interact with everything 
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it can in its environment. If this interaction is strong enough, it may result in a 
measurement-like process (as we have discussed before), and an effective collapse 
will thus take place, countering the effect of spreading. So the collapsed wave packet 
remains localized and the above reasoning about classicality seems to be alright 
if we take into account the fact that the environment acts like a piece of position 
measuring apparatus. In other words, we do generically have (9.12), and hence with 
(9.11), classical physics. 

However, this argument is faulty, because we have assumed the validity of the 
Schrédinger evolution. But why should that hold true for a wave function that col- 
lapses all the time due to interactions with the environment? Is the disturbance of the 
environment such that it only decoheres (i.e., counters the spreading of) the wave 
function, without disturbing the evolution of the center of the wave packet? The sit- 
uation is reminiscent of Browian motion, where the effects of the environment lead 
to diffusion and friction (dissipation of energy). The question is therefore: Does 
the “reading” of the particle’s position by the environment happen on a different 
(shorter) time scale than friction and dissipation, which are also generated by the 
interaction? The answer must be sought in the proper derivation of an appropri- 
ate new phenomenological equation which should describe the “reading” process in 
such a way that spreading is suppressed, i.e., (9.11) holds, only very little (or no) 
friction and diffusion is present, and (9.12) still holds. We shall say more about this 
in Sect. 9.4. 


9.4 The Classical Limit of Bohmian Mechanics 


In Bohmian mechanics the question of the classical limit is simply this: Under what 
physical conditions are the Bohmian trajectories close to Newtonian trajectories? 
In the previous section we discussed one possibility, in a rather hand-waving way, 
namely the evolution of narrow wave packets. If they move classically, the Bohmian 
trajectories do so, too, because they are dragged along, as it were. But now we 
would like to find a more general answer. At some point the narrow wave packets 
will certainly play a (technical) role, but they do not provide a fundamental answer. 
In fact the best hopes for a fundamental answer lie in exactly the opposite direction: 
a freely moving wave packet that spreads all over the place! In the long run the 
Bohmian trajectories of a freely moving wave packet become classical. 

To see this we have to specify what we mean by “in the long run”. We shall have 
to look at the wave function on a macroscopic scale (in time and space). Where 
else would we expect to see classical behavior? We have already encountered such 
a macroscopic scaling in our treatment of Brownian motion. However, the scaling 
here is a bit different, since we do expect ballistic rather than diffusive motion. Now, 
the macroscopic position of the Bohmian particle at a macroscopic time is 


Xe(x,1) = ex (*,-) . (9.15) 
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Here we think of € as being very small, eventually even going to zero for precise 
limit statements. Note that X_(x,t) = v”(x/e,t/e) is of order one. The wave func- 
tion on this scale is 


yey =ey(84), 


so the quantum equilibrium density (with norm one) becomes |We(x,f)|*. This is 
easily seen by computing any expectation value. A simple change of variables gives 


¥ [(xe(0))] = [v(x £)| renee 
= fe |w(E.5)) sees = Bm [F(K09)] 


The Schrédinger equation for the free evolution becomes after rescaling 


2 a2 
pe oT ae) (9.16) 
ot 2m 


Now note that € appears only in the combination €f. Thus the unphysical limit i — 0 
can in fact be interpreted as the macroscopic limit € — 0. 
If the Schrodinger equation contains a potential V, then (9.16) becomes 


202 
neo vet) he x 


ot = m Axwe(x,1)+V (=) velx.r) 


We make this observation in order to establish the following point. For classical be- 
havior, one will have to assume that the potential varies on the macroscopic scale, 
i.e., V(x/e€) = U(x). But the potential is given by the physical situation so, if any- 
thing, it is the potential that defines the macroscopic scale, i.e., it is the potential that 
defines the scaling parameter €. More precisely, € will be given by the ratio of two 
length scales, the width of the wave function and the characteristic length on which 
the potential varies [for the latter compare with (9.14)]. We do not wish to pursue 
this further here (but see [5]). 

We continue with the free evolution. We wish to find an expression for We(x,t) 
when € ®% 0. The solution of (9.16) is a superposition of plane waves and this is done 
with mathematical rigor in Remark 15.7. This involves nothing other than solving 
the equation via Fourier transformation (as we did for the heat equation). We should 
be able to do that with our eyes shut, never mind all the rigorous mathematics it may 
be shrouded in: 


2: 
(kx =) | ik) dk, (9.17) 
m 
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where (k) is the Fourier transform of the wave function y at time zero. 
Now let us go on with the stationary phase argument. It says that for small € the 

only contribution comes from that k value for which the phase 

nk? 

S(k) =k-x— —t 

2m 
is stationary. This value is given by ko = mx/ht. Expanding around the stationary 
point, 


1 2 ht 
S(k) = = 


= k=kyy* 
é aE Ge 


putting this into (9.17), and multiplying by the phase factor, this gives 


exp (— - =) We(x,t) = (2e)~3/? exp (4 * ue / exp 50h) G(k)Pk 


speek it (pm) 9 (2) Lap 
= \ane) [oo|- Imhe (v ") %(5) wpe? 
(27) [o( sma’) 0(E+ 2%) Rats 


m \3/2 ~ [ V2me mx) 3 
=(=) foso(-wy@ (Eun BE) 


Next we let € — 0. (In Remark 15.8 we shall do these asymptotics in a rigorous 
manner, but for now we do not care about rigor.) The integral becomes a complex 
Gaussian integral, 


[exp (-?) Bu = Gy" (9.19) 


1 


Since the integrand is holomorphic, (9.19) is easily verified by a simple deformation 
of the integration path that changes the integral into a typical Gaussian integral [see 


(5.8)]. Thus 
i mx m \3/2_. 7mx 
exp (— fie 2t oe) vols a a) Ge ; 


or 


We(x,t) © (ry exp (em) (2%) (9.20) 
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We introduce 


mx? m \3/2. 7mx 
Semle)i= Fo» Root) = () #7)» 
to get 
i 
We(x,t) © R(x,t) exp gS) ; (9.21) 


Thus on the macroscopic scale the Bohmian velocity field is given by 


V 
vie (x,t) = w (=, ~) -_ fig yWly,t/€) 
e’e) ~ m> Wyst/8) \yax/e 
_ eh VWe(x,t) 1 _x 
— re We(X,t) i] wy) V Selass (X51) =a (9.22) 


since the contribution coming from R or ¥, which is complex and thus contributes to 
the derivative, is of order € and thus negligible. Evaluated for macroscopic Bohmian 
trajectories, this gives 


Xe (Xt) = W¥" (Ne (X40) 1) © es) (9.23) 


so macroscopically the Bohmian trajectories will indeed become classical, in the 
sense that they become straight lines with the macroscopic velocities X¢ (x,t) /t. We 
can reformulate this as a result holding for the long time asymptotics of Bohmian 
trajectories which are guided by a freely evolving wave function. In view of (9.15), 
we can write 

X_(ex,t) X(x,t/e) 


X (x,t/€) = Xe (ex,t) © a ae (9.24) 


This shows that, in the case of free motion in the long run, i.e., asymptotically 
(t/€ — ee for € — 0), the Bohmian trajectories become straight lines with the asymp- 
totic velocity X (x,t) /t, for t big [6]. In particular we shall see in Chap. 16 on scat- 
tering that, after leaving the scattering center, a scattered particle will move asymp- 
totically along a straight line, just as one observes in cloud chambers. 

The moral of (9.20) is best understood when we forget € (put it equal to unity) 
and consider x,f large! Equation (9.20) gives us the asymptotic form of the wave 
function for large times (or for large distances, depending on one’s point of view): 


y (x,t) a (2) exp (7) (2%) (9.25) 


This says that, for large times, the wave function will be localized at positions x 
such that the “momentum” mx/t, or more precisely the wave vector (m/h)(x/t), 
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lies in the support of the Fourier transform v of y. That is why the “momentum 
distribution” is given by | 7|7, as we shall now show in a little more detail. 
For € small, let us call 


the asymptotic velocity. In (9.23), we identified this as the velocity of the (straight) 
macroscopic Bohmian trajectory. Using the feature (8.19) of the quantum equilib- 
rium distribution and the asymptotic form (9.20), we show that V.. is distributed 
according to |/|?. For any function f, 


¥(A((v=))) = f(A?) Iwemo) Pets 

= [£(E) iver Pers 

= {DEG G) & 

= fronlo yl G) &, 7) 


where we have used the natural substitution v = x/t. We shall reconsider this in 
a rigorous manner in Chap. 15. To sum up, what we see on the macroscopic scale 
are straight trajectories starting at the origin, with velocity distribution | f|*. In other 
words, the classical phase space ensemble 6 (x)| {/|7(k) is transported along classical 
force free trajectories. If one wants to have another starting point xo, the initial wave 
function must be chosen to be supported around that point, y(-—xo/e€,0). 

All the above ultimately results from the dispersion of wave groups (waves of dif- 
ferent wavelengths have different speeds), which we already alluded to in Chap. 6. 
We remark in passing that the distribution of the free asymptotic velocity [which 
according to (9.24) and (9.26) is given by |/|7] has some meaning for Heisenberg’s 
uncertainty relation. The latter relates this distribution to that of the initial position 
of the particle. We can already guess that, when all is said and done, Heisenberg’s 
uncertainty relation will be recognized as a simple consequence of Bohmian me- 
chanics and quantum equilibrium. 

Let us go on with Sqass(X,f) := mx? /2t, the Hamilton-Jacobi function of a free 
particle of mass m. Let us call wave functions of the form (9.21) local plane wave 
packets. They consist of “local” plane waves, where each of them produces a straight 
line as trajectory. We can lift this picture to the situation where a wave moves in 
some potential V that varies on a macroscopic scale, meaning for example that a 
relation like (9.14) is satisfied. Then one expects the wave to obtain the form (9.21) 
once again, but now with the Hamilton—Jacobi function S¢jass(V) that contains the 
potential V. Moreover, the local plane waves will be guided by the potential, chang- 
ing the wavelengths in an analogous way to what happens when light passes through 
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an optical medium. The classical phase space ensemble 6(x — xo)|#|(k)? will now 
be transported along the classical trajectories governed by S¢jass(V). 

In Sect. 2.3, we said that the Hamilton—Jacobi function was useless because typ- 
ically it cannot be defined in a unique manner, at least, not if there are two separate 
points in configuration space that are, in a given time interval, connected by more 
than one (classical) trajectory, as we mentioned for the case of a ball reflected by a 
wall. Here we encounter this difficulty once more. If two different (classical) trajec- 
tories can pass from one point in configuration space to another, then this means that 
two local plane waves can originate from one (macroscopic) point and meet again 
in another, thereby interfering! The classical trajectories simply move through each 
other, while Bohmian trajectories cannot do that. Thus whenever the local plane 
waves meet again, we do not have classicality. But this is a typical situation! So have 
we uncovered a serious problem with Bohmian mechanics? Of course, we have not. 
We have simply ignored the fact that there is a world surrounding the particles we 
wish to look at. We have ignored the effects of decoherence, the “reading” of the 
environment. 

So the full picture is as follows. Dissipation produces local plane waves that 
cannot interfere any more because of decoherence, each of them being multiplied 
by a wave function for the environment, as in the measurement experiments. We are 
thus left with one local plane wave, the one which guides the particle, which moves 
along a classical trajectory. 


9.5 Some Further Observations 


9.5.1 Dirac Formalism, Density Matrix, 
Reduced Density Matrix, and Decoherence 


Effective wave functions, as we discussed them above, emerge from decoherence, 
which arises from interaction of the system in question with its environment. The 
environment may be a piece of apparatus (pointer) in a carefully designed experi- 
ment or simply the noisy environment that is always present. In the latter case, it is 
feasible that a random evolution for the effective wave function of the system might 
emerge from an analysis of the combined system and environment, in the same way 
as a Wiener process emerges from the analysis of the motion of a Brownian particle 
in its environment. 

The types of evolution one would expect to emerge are those that have been sug- 
gested in the so-called spontaneous localization models [1]. These describe Brow- 
nian motion as a process on the space of wave functions, which, mathematically 
speaking, will be chosen to be the Hilbert space of square (Lebesgue) integrable 
functions. The wave functions thus follow diffusion-like paths through Hilbert 
space. The probability distribution of random wave functions is given by density 
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matrices. But beware! A density matrix need not have the meaning of a probability 
distribution of random wave functions. We shall see this in a moment. 
Phenomenological evolution equations for the so-called reduced density matri- 
ces (the analogue of the heat equation for Brownian motion) have been rather ex- 
tensively studied and for a long time now. The evolution of a density matrix through 
such phenomenological equations, which in the course of time lead to a vanishing of 
the “off-diagonal” elements, has become a celebrated solution of the measurement 
problem. Celebrated because it is a solution from “within” quantum mechanics. 
However, the vanishing of the off-diagonal elements is nothing but a rephrasing of 
the fapp impossibility of bringing wave packets belonging to different pointer posi- 
tions into interference. The sole purpose of the following section is to acquaint the 
reader with the basic technicalities so that she/he need not feel intimidated if she/he 
encounters the overwhelming technical arguments that are often claimed to be “so- 
lutions” to the measurement problem. We shall also quickly introduce the Dirac 
formalism, which is a wonderful formalism for vector spaces with a scalar product. 


Remark 9.3. On the Dirac and Density Matrix Formalisms 
The Dirac formalism is a symbolism which is well adapted to the computation of 
expectation values in quantum equilibrium, since they become expressions of scalar 
product type. In Sect. 15.2.1, we shall review the Dirac formalism with more math- 
ematical background in hand. 

A wave function y (i.e., an element of the Hilbert space) is represented by the 
symbol |y), and the projection (in the sense of scalar products) of @ onto y is 
denoted by 


(vio) = [ weer. 


That is, (-|-) stands for the scalar product on the Hilbert space. |x) symbolizes the 
wave function which is “localized” at x, and one reads as follows: 


V (wl) = ||w|| = norm of y, 
(x|w) = w(x) = value of y at x , 
(W|x) = w*(x), 

(x'|x) = (x —x) , 

|x) (x| = orthogonal projection on |x) . 


The wave function y is the following superposition of the |x): 


Iv) = [ xyixly)ex, (9.21) 


which is simply the coordinate representation of the vector |y) in the basis |x). In 
particular, for the identity |, we obtain 
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3 
[eld “=, (9.28) 


and the scalar product is simply 


(viv) = | (vis) ly)a*s. 
We introduce the operator X by defining its action on the basis |x) as 
&|x) = x|x) , (9.29) 

or alternatively by its matrix elements 

(x’|&|x) = x6(x’—x) . 
One finds (rather easily) that the matrix elements of the V operator are 

(x'|V|x) = 5(x’ —x)dx. (9.30) 
Likewise for A 

(x’|A|x) = 5(x’ —x)a2. (9.31) 


The Schrédinger equation now reads 


_o 
ins lV) = Alvi) , 


with H as Hamilton operator. As for any linear equation, the (formal) solution is 
given by 


Iv) =exp (— 5H") iyo). 


Using the Dirac notation, the expectation value of the Bohmian position can be 
written as 


(X)"(t) = EY (X(1)) 
= [xiy(x.nPas 
= (Wil Xl Wr) 
= f Px(vilx) (xiaiyr) 


= tr(%| Wr) (We) 
= tr(2p,) , (9.32) 
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where p; = |/,) (Y;| is called the density matrix or statistical operator, and tr denotes 
the trace. Note that 


ra) fa) 
ane = a (lw) (Wel) 


= (Sime) (vel tye ($-0uel) 
= ~Fariyi(vel+le)ul (227) 


i 
~; [Her (9.33) 
where [A,B] = AB — BA denotes the commutator of the two operators A and B. This 
is the quantum mechanical analogue of Liouville’s equation. It is called the von 
Neumann equation. 

Now to the point. The object p = | y)(w| has matrix elements 


p(x,x’) = (xl) (yix') = w(x) y"(x’) , (9.34) 


and in view of (9.27), we may write p in the form 
p=lw)(wl= [[axaa’y(x)y"(x')}x) (x. (9.35) 
This is often called a pure state.* In contrast, the diagonal density matrix 


p= [ dxiy(x)/?lx) (x (9.36) 


can be read as a statistical mixture of the localized wave packets |x), where each 
packet appears with probability | y(x)|?. This is why the density matrix is also called 
the statistical operator. 

For any kind of density matrice we can compute averages according to (9.32). But 
sometimes a density matrix is neither a pure state nor a mixture of states. Consider 
Fig. 9.5 and the scattered light waves, which are different depending on whether 
WL = |!) or Wr = |r) is effective. We shall symbolize the light waves by “pointer” 
wave functions Bp = |0), ®, = |L) and ®g = |R). We can then describe the process 
as follows. Initially, we have (o%|/) + o,|r))|0), where the environment (the light) 
is represented by the unscattered state |0). From this we obtain oz|/)|L) + a,|r)|R). 
The wave function of the system is understood to be time dependent, so that |/) 
denotes a wave function moving to the left and |r) denotes a wave function moving 
to the right. The density matrix is a pure state of the form (9.35), and its change in 
time is given by 


4 For any pure state we have p? = p, which does not hold for a mixture (9.36). 
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p = |ou|?|2)|0) (2|(0| + |a%-|?|r)|0) (r| (| + orye%,*|2) 0) (r| (O| + 0-04" |r) |0) (Z| (0| 
t—>T 
pr = |ou|*|2)|L) (2|(L| + |r|? |r) [R) (r|(R| + a 0%," |1)|L) (r| (R| + e047" |r) |R) (E|(L] . 


(9.37) 


This looks more complicated than it really is. Let us focus on the position of the 
particle, i.e., we are interested in expectation values of functions of the particle 
position alone. In that case we can “trace out” the environment. That is, in (9.32), 
we take the trace over all the states |Y) of the environment. This partial tracing yields 
the reduced density matrix 


pr" (x,x') = [trypr](x,x) 
= fav (rout? xe) |e) (ae!) (LI + lonl? alr)LR) (rle)(R 
+4045 (x|1)|L) (rl?) (R| + cx-cy* (x17) 1R) (ils) LI] fF) 
= oul? (xl0) (le) f a¥ (LLY) + lon (ale) ria’) f av |(RYYP 


+0406 (xl) rl) fa (YE) (RLY) + ran (alr) (0h) fay (YR LIY) 
(9.38) 


which can be used to compute all the relevant expectations. 

Now we must think as follows. Let us picture the scattering states |R) and |L) 
as pointer states which have macroscopically disjoint supports in (light wave) con- 
figuration space. The light waves occupy different regions. Since (Y|R) contributes 
only with those Y that are in the support of |R), and (Y|L) contributes only with 
those Y in the support of |Z), this then shows that, according to the very small 
overlap of |L) and |R), the last two integrals in (9.38) are very small. We thus ar- 
rive at a result which easily ranks among the most severely misinterpreted results 
in science: the reduced density matrix acquires almost diagonal form. Therefore 
it looks like the density matrix of the mixture of the wave functions |/) and |r) 
with weights |o|* and |q,|?, respectively, where we have used the normalization 
fay |(¥|R)?2 = fay |(y|L)/? =1: 

pred = try pr = |au)?|d)(0) + lal?) (r (9.39) 
Why is this result often misunderstood? Suppose we do not know the wave function 
of a system, but our ignorance about it can be expressed in terms of probabilities, 
viz., with probability |q;|? the wave function is |/) and with probability |q,| it is 
|r). Then the corresponding density matrix would be exactly the right-hand side of 
(9.39). Therefore one may easily be trapped into thinking that the left-hand side of 
(9.39), which approximately equals the right-hand side, also submits to the igno- 
rance interpretation. In short, decoherence seems to turn “and” into “or”. But (9.39) 
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is nothing but a rephrasing of (9.3). The only difference is the mathematical formu- 
lation. 

Many physicists have nevertheless taken (9.39) as the solution to the measure- 
ment problem, as if Schrddinger had not been aware of this calculation. Of course, 
that is nonsense. Schrodinger based his cat story on the fapp impossibility of the 
interference of macroscopically disjoint wave packets. So let us repeat [7]:> There 
is a difference between a shaky or out-of-focus photograph and a snapshot of clouds 
and fog banks. 

Everything else that needs to be said has already been pointed out in Remark 9.1. 
Only with Bohmian mechanics can (9.39) be interpreted as a mixture. | 


Remark 9.4. Collapse Equations 

In Bohmian mechanics the result (9.39) means that just one of the wave functions 
will actually guide the particle. Moreover, the environment will continue to “read” 
the particle position, so the effective guiding wave will continue to be a localized 
wave packet. We wish to discuss briefly the type of equations that govern the cor- 
responding reduced density matrix, which represents the probability distribution of 
the random wave function. That is, we wish to describe the time evolution of the 
reduced density matrix. 

The evolution equation arises from the full quantum mechanical description of 
system and environment in an appropriate scaling, and it must satisfy two desiderata: 
the off-diagonal elements must go to zero and the Newtonian equations of motion 
should be satisfied in the mean. Otherwise we could not expect Newtonian behavior 
of Bohmian trajectories for highly localized wave functions. As a (mathematical) 
example we give the simplest such equation for p,, namely, 


2 pee *iH1.p) LS, , (9.40) 
with (A > 0) 
(x|3,|x’) = —A (x—x’)p,(x,x’) . (9.41) 
For H = 0, the solution is 
pr(x,x’) =e OY po(x,x/) , 


and one sees that the off-diagonal elements vanish at the rate A. However, this is 
an unphysical model in the sense that the strength of decoherence saturates when 
the distance x — x’ has achieved a certain macroscopic value. Notwithstanding these 
shortcomings, equations of this type have been studied and “physical values” for A 
have been proposed [9]. However, we have seen that (9.40) and (9.41) describe at 
least roughly the emergence of a statistical mixture. 

Next let us check whether the Newtonian equations of motion hold in the mean. 
We differentiate the last equality of (9.32), and with (9.40), we obtain 


> Translation by John D. Trimmer in [8]. 
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d d 


a (XU) = 5 (8) 


—t zi 
alle rT ia 
= —;tr(8(H,p1)) + tr(RE;) . 


Differentiating once more yields® 


a 1 i, oO 
Note that we have already derived the first part of the equation in (9.11), without 
using abstract operator calculus. The result is consistent with classical motion if 


a0 _ 
tr (25,5:] =0 (9.42) 
and 
tr(X[H, 3;]) = tr(XHZ,) — tr(&L,H) =0. (9.43) 


For (9.42), it suffices that tr(XZ;) be constant. One requires 
0 = tr(&3,) = / dxx (x|3;|x) , (9.44) 


where in the last step we have computed the trace and used (9.29). The above cer- 
tainly holds if the kernel (x’|X,|x) vanishes on the diagonal. Concerning (9.43), note 
that the trace is invariant under cyclic permutations: 


tr(X2,H) = tr(HR2,) , (9.45) 
so that from (9.43) we obtain the requirement 
tr(&[H,2,]) = tr([&,H]2,) =0. (9.46) 
Next we observe that, since V is a function of the position operator %, we have 
[&,V()] =0, 


and thus 


© The untrained reader may have difficulty checking the computation. It is in fact straightforward, 
but one may need some practice. For example, one should first compute [H,£] for a Schrédinger 
Hamiltonian of the usual form H = —(h? /2m)A+V(&). Moreover, it should be understood that the 
trace remains unchanged under cyclic permutations of the arguments [see (9.45)]. 
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: eae tie aa 
(x|[&,A}x!) = — 5 (x|[R, An |x") + (x1, V (RDI!) = — 5 (xI[R An] x’) - 
The first commutator can be handled using (9.31). Then, using (9.28), we may 

straightforwardly compute 


tr((8,H]E;) = / Px / dx’ (x|[, H]|x’) (x'|E,|x) 


2 
= [ex [ors (x—x’) aye (al) ; 


Thus (9.46) and hence (9.43) certainly hold if not only the kernel (x’|;|x) but also 
its differential (with respect to one of the variables) vanishes on the diagonal. Our 
choice (9.41) is one example that fulfills both these requirements. 

Equation (9.40) with (9.41) is a special case of a general class of evolution equa- 
tions for density matrices, the class of equations of Lindblad form [1, 9, 10]. Among 
other things, the Lindblad form guarantees that the solution p; remains positive 
definite, so that its interpretation as a statistical operator remains consistent. The 
general form of 2 for a self-adjoint operator A (see Chap. 14 for the notion of self- 
adjointness) is (see, for instance, [1, 9]) 


= U 249 2 
X= ApA— x(A p+pA‘*), (9.47) 
The choice A = V2AX yields 


(nize) = (x “) 


= 2Ax-x'(x|p|x’) — Ax? (x|p|x’) — Ax? (x|p|x’) 
= —A(x—x’)?(x|p|x’) . 


ApA— 5(4°p + pA’) 


Of course, if we wish to keep the Newtonian equations in the mean, A must commute 
with X, i.e., it must be a function of &. However, this is not the place to pursue this 
further. a 


9.5.2 Poincaré Recurrence 


Recurrence also appears in quantum mechanics, in the following sense [11]. Con- 
sider a system with discrete eigenvalues’ E,,. Let wo be the system wave function at 
time 0, and € > 0. Then there exists a time T > 0, such that the distance 

lw(T) — woll <e. 


7 This corresponds to the finite measure condition in the classical argument of Theorem 4.1. 
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Wave functions that are close in this sense yield almost the same statistics. Here is 
the argument for recurrence. Suppose 


H|n) = E,|n) , 


where |n), n € N, defines a basis of eigenfunctions. Then 
i 
|W:) = exp =o | Yo) 
= i 
= ¥ exp (— 0) fn) oly 
n=0 h 


= 3 exp (-j2) In) (nl yo) . 


Hence 
Iv) — lvo) = (e-#*"/*— 1) |) (nl yo) 
n=0 
and 
A 2 
lve) — Lop]? = Je — 1] nl yop? 
n=0 
ea ee ce 2 
= p> [ cos ( ; ‘)| |(n|Wo)|~ . 
For appropriate N, 
. En 2 . 2 
Y [1 cos (Fr) | niyo)? <2 [elo 
n=N n=N 


will be arbitrarily small, since 


¥ [nl yo}? = (wolyo) = [lyoll = 1. 


n=0 


Therefore we need only show that, for appropriately chosen T’,, the quantity 


25 [1 cos (#r)| |(n| wo) |? 


also becomes arbitrarily small. This can be done because any frequency can be ap- 
proximated arbitrarily well by a rational frequency. Since T can be chosen as large 
as we wish, rational frequencies multiplied by T can be turned into integer multiples 
of 27. This underlies the rigorous proof using almost periodic functions [12]. 


Mathematical Physics 


200 9 The Macroscopic World 


References 


1. A. Bassi, G. Ghirardi: Phys. Rep. 379 (5-6), 257 (2003) 
2. M. Bell, K. Gottfried, and M. Veltman (Eds.): John S. Bell on the Foundations of Quantum 
Mechanics (World Scientific Publishing Co. Inc., River Edge, NJ, 2001) 
R. Tumulka: J. Stat. Phys. 125 (4), 825 (2006) 
R. Tumulka: Proc. R. Soc. Lond. Ser. A Math. Phys. Eng. Sci. 462 (2070), 1897 (2006) 
V. Allori, D. Diirr, S. Goldstein, N. Zanghi: Journal of Optics B 4, 482 (2002) 
S. Romer, D. Diirr, T. Moser: J. Phys. A 38, 8421 (2005); math-ph/0505074 
E. Schrédinger: Naturwissenschaften 23, 807 (1935) 
J.A. Wheeler, W.H. Zurek (Eds.): Quantum Theory and Measurement, Princeton Series in 
Physics (Princeton University Press, Princeton, NJ, 1983) 
9. D. Giulini, E. Joos, C. Kiefer, J. Kumpsch, I.0. Stamatescu, H. Zeh: Decoherence and the 
Appearance of a Classical World in Quantum Theory (Springer-Verlag, Berlin, 1996) 
10. G.C. Ghirardi, P. Pearle, A. Rimini: Phys. Rev. A 42 (1), 78 (1990) 
11. P. Bocchieri, A. Loinger: Phys. Rev. 107, 337 (1957) 
12. H. Bohr: Fastperiodische Funktionen (Springer, 1932) 


OO) SION: se G0) 


Mathematical PhyAics 


Chapter 10 
Nonlocality 


Bohmian mechanics is about particles guided by a wave. This is new, but not rev- 
olutionary physics. This chapter will now present a paradigm shift. It is about how 
nature is, or better, it is about how any theory which aims at a correct description 
of nature must be. Any such theory must be nonlocal. We do not attempt to define 
nonlocality (see [1] for a serious examination of the notion), but simply take it prag- 
matically as meaning that the theory contains action at a distance in the true meaning 
of the words, i.e., faster than light action between spacelike separated events. Since 
we shall exemplify the idea shortly, this should suffice for the moment. 

We note that the action at a distance in question here is such that no information 
can be sent with superluminal speed, whence no inconsistency with special relativity 
arises. However, action at a distance does seem to be at odds with special relativity. 
Einstein held the view that nonlocality is unphysical, and referred to nonlocal ac- 
tions as “ghost fields”, a notion which expressed his contempt for nonlocal theories. 
But such theories are not unfamiliar in physics. For example, Newtonian mechanics, 
which is non-relativistic, is nonlocal. Bohmian mechanics is nonlocal, too. But there 
is anoteworthy difference between the nonlocality of Newtonian mechanics and that 
of Bohmian mechanics. For the latter is encoded in the wave function which lives 
on configuration space and is by its very nature a nonlocal agent. All particles are 
guided simultaneously by the wave function, and if the wave function is entangled, 
the nonlocal action does not get small with the spatial distance between the particles, 
in contrast to what happens in a Newtonian system with gravitational interaction. 

In a two-particle system, with coordinates X;(t) and X2(t), we have 


9 
aT WV x,X (t) 
ax em i 


mM, y(Xi(t), X2(t)) , 


whence the velocity of X; at time ¢ depends in general on Xo at time ¢, no matter 
how far apart the positions are. “In general” means here that the wave function 
y(x,y) is entangled and not a product w(x, y) = 9(x)®(y), for example. There is 
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no immediate reason why the wave function should become a product when x and 
y are far apart.! Therefore, if some local interaction changes the wave function at 
X2(t), that will immediately affect the velocity of particle 1. 

Bohmian mechanics is definitely nonlocal, because the wave function is a func- 
tion on configuration space. In fact, since the wave function is an object of all quan- 
tum theories, Bohmian or not, quantum mechanics is nonlocal. One may dislike 
nonlocality. In that case one should ask: Can one do better? Is it possible to describe 
nature by a local theory? For Einstein, Podolsky, and Rosen (EPR) [2] the answer 
was unquestionably affirmative. They presented an argument based on their belief 
that all theories of nature must be local, which was supposed to prove that quantum 
mechanics is incomplete. Besides the (nonlocal) wave function (which Einstein con- 
sidered as merely expressing “probability”, which is naturally a function on phase 
space in classical physics, and hence also on configuration space), there are other — 
local — hidden variables that have been left out of the physical description. 

We no longer need to argue that quantum mechanics is incomplete, but the EPR 
argument is nevertheless of interest, as it constitutes one part of Bell’s proof of 
the nonlocality of nature. Bell’s response to the question as to whether one can do 
better is Bell’s theorem, and it answers in the negative: one cannot do better. Nature 
is nonlocal. 

Bell’s theorem has two parts. The first is the Einstein—Podolsky—Rosen argument 
[2] applied to the simplified version of the EPR Gedanken experiment considered 
by David Bohm [3], viz., the EPRB experiment. It is based on the fact that one 
can prepare a special pair (L, R) of spin 1/2 particles which fly apart in opposite 
directions (L to the left and R to the right), and which behave in the following 
well determined way. When both particles pass identically oriented Stern—Gerlach 
magnets, they get deflected in exactly opposite directions, i.e., if L moves up, R 
moves down, and vice versa. In quantum language, if L has a-spin +1/2, then R 
has a-spin —1/2, and vice versa, where a is the orientation of the magnets (see 
Fig. 10.1). This is true for all directions a. Moreover, the probability for (L up, R 
down) is 1/2. The two-particle wave function is called a singlet state [see (10.4)]. 
The total spin of this singlet state is zero. We shall give details later and simply note 
for now that such is the physical reality, and it is correctly described by quantum 
mechanics. We obtain opposite values for the spins when the particles move through 
a-directed magnets, for any a. 

Now we come to the first part of the nonlocality argument. Measuring first the a- 
spin on L, we can predict with certainty the result of the measurement of the a-spin 
on R. This is true even if the measurement events at L and R are spacelike sepa- 
rated. Suppose therefore that the experiment is arranged in such a way that a light 
signal cannot communicate the L-result to the R-particle before the R-particle passes 
SGM-R. Suppose now that “locality” holds, meaning that the spin measurement on 
one side has no superluminal influence on the result of the spin measurement on 
the other side. Then we must conclude that the value we predict for the a-spin on R 
is preexisting. It cannot have been created by the result obtained on L, because we 


' However, decoherence is always lurking there, awaiting an opportunity to destroy coherence, i.e., 
to produce an effective product structure. 
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Fig. 10.1 The EPR experiment. Two particles in the singlet spin state (10.4) fly apart and move 
towards the Stern—Gerlach magnets SGM-L und SGM-R. In situation (A), the Stern—Gerlach mag- 
nets are parallel, while in situation (B), the directions of the magnets are rotated shortly before the 
particles arrive. The measurements are made in such a short time that no communication of results 
transmitted with the speed of light between the left and the right is possible within the duration of 
the measurement process 


assume locality. Now let us reflect on that. If the value preexists, then that means 
that it exists even before the decision was taken in which direction a the spin on the 
left is to be measured. Hence the value preexists for any direction a. By symmetry 
this holds also for the values obtained on L. Therefore, by locality, we obtain the 
preexisting values of spins on either side in any direction. We collect the preexist- 
ing values in a family of variables x) x®) € {—1,1}, with a indexing arbitrary 
directions and with x) = —x®), 

The locality check is now simply to ask whether such preexisting values actually 
exist. In other words, do the preexisting values accommodate the measured correla- 
tions? This leads to the second part of Bell’s proof, namely to show that the answer 
is negative. One might think that this would be a formidable task. One might think 
that because we make no assumptions about the nature of the variables. They can 
conspire in the most ingenious way. Contradicting Einstein’s famous: “Subtle is the 
lord, but malicious He is not”, we might say: “The lord could have been malicious” 
in correlating the variables in such an intricate manner that they do whatever they 
are supposed to do. Of course, the particles can no longer conspire when they are 
far apart, because that is forbidden by the locality assumption. But at the time when 
the particles are still together in the source, before they fly apart, they can conspire 
to form the wildest correlations. 

But no, this part of the proof is trivial. There is no way the variables can repro- 
duce the quantum mechanical (which are the Bohmian) correlations. Choose three 
directions, given by unit vectors a,b,c, and consider the corresponding 6 variables 


x) x y,z& {a,b,c}. They must satisfy 
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(x) ae, xo) = (-x®, = -x{) (10.1) 
We wish to reproduce the relative frequencies of the anti-correlation events 


KO =-x, ax, XH a, 


Adding the probabilities and using the rules of probability in the inequality, we get 
P(x =—x(®) +P (x =x) +P (xt) = —x(®) 
=P (XY) =X) +P (XP =x) +P =XP) [by cosy 


SP =X, orX Sas orks Sas") 


= P(sure event) = 1, 
because xl, i=L,R, y€ {a,b,c}, can only take two values. 
This is thus one version of Bell’s inequality: 


P (xi) = —x\) +P (xf? =—xi) +P (x) = xl) 21. 02) 


We shall show in a moment that the quantum mechanical value (which is of course 
the Bohmian one) for the probability of perfect anti-correlations is 3/4 if the angles 
between a,b,c are each 120°. Therefore quantum mechanics contradicts (10.2). 

The logical structure of Bell’s nonlocality argument is thus as follows [4]. Let P 
be the hypothesis of the existence of preexisting values Xy . ; for the spin compo- 
nents relevant to this EPRB experiment. Then 


First part quantum mechanics + locality => P, 
Second part quantum mechanics —> not P, (10.3) 
Conclusion quantum mechanics —> not locality . 


To save locality, one could hope that the quantum mechanical value would be false. 
So let us forget about quantum mechanics and all other theories and simply take 
the experimental facts. They show quite convincingly” that (10.2) is violated by 
the observed relative frequencies [5]. There is no doubt that better experiments will 
corroborate the finding that (10.2) is violated. In (10.3) we can therefore replace 
“quantum mechanics” by “experimental facts”. This then is independent of any the- 
ory and yields a conclusion about nature: nature is nonlocal. The practical meaning 
is that, if you devise a theory about nature, you had better make sure that it is non- 
local, otherwise it is irrelevant. Bohmian mechanics violates Bell’s inequalities and 


? The experiments done up until now contain so-called loopholes, which are based on detector 
deficiencies and the belief that nature behaves in a conspiratorial way. Since the experiments agree 
well with the quantum mechanical predictions, such beliefs seem unlikely to be well founded. 
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furthermore predicts the values measured in experiments. Concerning nonlocality, 
one cannot do better than Bohmian mechanics. 


10.1 Singlet State and Probabilities for Anti-Correlations 
The spin part of the singlet wave function is the antisymmetric vector 


1 
= = . 10.4 
ve = (| Mal Y2—| Lal 12) (10.4) 
This factor multiplies the position dependent symmetric wave function 


W(x1,%2) = WL(X1) WR(X2) + WR(X1) YL(X2) , 


which arranges for one particle to move to the left and one to move to the right. This 
spatial arrangement is not contained in the spin part (10.4). For the purpose of com- 
puting the probabilities, the spatial symmetrization is rather irrelevant. Furthermore, 
computing everything while maintaining the symmetrization is a bit demanding and, 
for the sake of simplicity, we focus only on the term Wh(x1) Wr(x2) Ws, thereby view- 
ing the first factor in (10.4) as belonging to the particle on the left and the second 
factor as belonging to the particle on the right. Then, for the purpose of computing 
averages, we can forget about the position part altogether and focus on the spin part 
alone. But the reader should bear in mind that it is in fact the trajectory of each 
particle that determines the value of the spin. 

Now for a few facts concerning (10.4). First | 1) is the spinor for spin up (+1/2) 
in an arbitrarily chosen direction, because the singlet is completely symmetric under 
change of basis. Suppose we express | T)x,| |)% in another orthogonal basis ix, j;, 
viz., 


lT)k =cosa+j, sina , ||), =—igsina+j,cosa , 


then one readily computes 


stig —jiir) 
= 75 1J2 —Jit2) - 


We note immediately that the total spin in the singlet state is zero. The following 
computation shows what is meant by this. One considers the spin operator for the 
combined system, i.e., for the two particles, which is given by 


a-o) @1+1@a-o® , 


where a-o!) @1 (and analogously the second summand) is to be read as follows. 
The operator describes the statistics of the measurement of spin in the direction a on 
SGM-L, and technically the first operator factor acts on the first spinor factor, while 
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the second acts on the second spinor factor. The sum a- o) @1l+1@a-o) is thus 
the operator describing the statistics of measurements of spin on the left and on the 
right, which we call measurement of the total spin. 

Now using (8.25) in Chap. 8, we compute the average value, in the singlet wave 
function, of the square of the total spin in an arbitrary direction, which we chose 
here as the a direction. We conveniently choose | [) and | |) in the singlet wave 
function as eigenvectors of the spin matrix a- o*), In the calculation, we use the 
fact that (f |I| |) =O and (f ja-o™| |) =+( | |) =0, and we suppress the indices 
1,2 on the spin factors. Then ignoring the dimension and scale factors /2 for the 
spin 1/2 particle, we compute 


C 


(ao @I+1@a-0)”] ys) 
=§] (fo | 1) LIFE) +(e? 1ycrIPI | 
2 (a-o)’| ) 
eAT/@OU) TE eae) (ee) uae) 
2 


Let us now turn to the correlations 
pWs __ 
“a,b (y, 


This expression is bilinear in a and b, and rotationally invariant. Therefore the ex- 
pression must be a multiple of a-b, i.e., oh » —4a-:b, where A is determined by the 
value one gets for a = b. For the singlet, this is (the spin values are exactly opposite) 


pWs __ 
“aa — —1. Hence, 


ao) @b-o)| ys) . 


cab = ab. (10.5) 


With this we can determine the quantum mechanical anti-correlation probability 
Py, = Ph (sl) = 5) , 


where SY, (2) are the “spin values”, without the need to introduce the quantum 


formalism of joint measurement statistics. According to the rules of probability, 


ab = <P +(1 —P%) =—-2PM +1, 


a, 


that is, 
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. 4 
pY,=5 +58 b. 


Choosing 120° for the angles between a,b, and c, we get 


1 1 1 1 1 
Ws Ws Ws __ 
Pab=>57~qg—q? Pae=q>  Phe=q- 
Hence for the sum of the probabilities on the left-hand side of (10.2), we obtain the 
value 3/4, as already claimed. 
Let us rephrase the argument for later reference to hidden variables. The EPR ar- 


gument yields, by the locality assumption, local hidden variables for spin, the ones 


we denoted by X, Cc, ab. vee The second part of Bell’s theorem shows that the existence 


of spin hidden ‘variables contradicts Bohmian mechanics (or quantum mechanics 
for that matter). In other words, local hidden variables cannot reproduce the exper- 
imental correlations. Expressed in yet another way, there exist no random variables 
p arld which have the quantum mechanical correlations (10.5). 

The following terminology, which yields absolutely no new insights, has also 
been used to describe Bell’s theorem. The spin measurement on the right side de- 
pends on the context in which the experiment is done, i.e., in the present case, it 
depends on what happens on the left, i.e., the hidden variable is, if it exists at all, 
“contextual”. Hence Bell’s theorem asserts that non-contextual hidden variables are 
not possible. 

We conclude this section with some light entertainment: 


e Bell’s theorem has (quite often) been cited as proving that Bohmian mechanics 
is impossible. Why? Presumably because Bohmian mechanics was viewed as a 
hidden variable theory and the hearsay on Bell’s theorem was that it proved that 
hidden variable theories conflict with quantum mechanics. 

e It has also been said that quantum mechanics is local despite (10.3). How can 
that be? By forbidding or not believing that the steps in (10.3) are valid. How can 
that be? We do not know. 

e It has also been said that Bell’s nonlocality is nothing more than “learning at a 
distance”. Wittgenstein’s blue and brown book are wrapped in packages. Now 
suppose you and your friend each get a package without knowing which of the 
two books it contains. Your friend leaves for the moon with his package. When 
the spacecraft lands, you open your package and you unwrap the blue book. You 
know immediately that your friend on the moon has the brown book. Again one 
may wonder how anyone could reach the misunderstanding that this is a possible 
reading of Bell’s nonlocality. Presumably because Bell’s article was not actually 
read, and conclusions were drawn from hearsay about Bell’s work. In his nice 
article entitled Bertlmann’s Socks and the Nature of Reality [6], Bell elaborates 
on the distinction between this unspectacular effect of learning at a distance and 
the spectacular effect of nonlocality. 
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10.2 Faster Than Light Signals? 


Bohmian mechanics is nonlocal. The wave function acts in a nonlocal way on the 
particles. Why can one not send signals faster than light? One cannot, because of the 
quantum equilibrium hypothesis. The action at a distance which the wave function 
mediates is randomized in such a way that it is unusable. If quantum equilibrium 
were false, superluminal signalling might perhaps be possible. Since there is no evi- 
dence that quantum equilibrium is false, there is no reason to speculate any further. 

Let us show for the sake of completeness how quantum equilibrium acts here. 
Let us take a general entangled two-particle state 


y=al til lj2+d| Lil T2tel Lil Lj2t+e| til Ne, 


with |a|? + |b|* + |c|? + |d|? = 1. The probability of getting the spin value | T)2 
at SGM-R is |b|? + |d|?. We now do a measurement, first on the left side in an 
arbitrarily chosen direction y at SGM-L, where y is the angle between the z-direction 
and the chosen direction. That is the freedom the experimenter has, and with which 
the experimenter can hope to affect the outcome on the right-hand side. Expressing 
the z-spin basis vectors in the corresponding y-basis, 


| T)1 =i cosy+j,siny, | l)1 =—isiny+j,cosy, 


we rewrite the above state as 
v = iy [cosy(a| |)2 +4] t)2) —siny(b| 1)2+el L)2)| 


+4: [siny(a| |)2 +4 T)2) +08 7(b| 1)2-+el L)2)| 
= Wi, + Yj, 


from which we read off that ||, ||? and ||y;, ||? are the probabilities for the out- 
comes spin up or spin down when measuring first at SGM-L(y). That measurement 
will “produce” a collapse of the entangled state. The collapse is the nonlocal ef- 
fect which could be the source for nonlocal signalling. The collapsed wave function 
will be the one in the support of which the particle is located after leaving SGM- 
L(/), ie., it will be either yy, /||yi, || or Yj, /||Y%, ||, depending on the outcome at 
SGM-L(y). 

And what is now the effect of this measurement on the probability for the out- 
come at SGM-R? To find out, we now compute the quantum equilibrium probability 
that the particle, when going through SGM-R (oriented in the z-direction), is in the 
support of, let us say, the spin-up wave function | [)2. Repeating the argument now 
with the new wave function yj, /|| yi, ||, the probability will be 


3 That quantum equilibrium prevents superluminal signalling is taken, however, as a motivation for 
research on quantum non-equilibrium [7]. 
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|i: | 1)2(dcos y— bsiny) ||” 
Iwi, |I? 


9 


while for yj, /||yj, ||, it will be 


||is| T)2(beosy+-dsin y)]|” 
vil? 


From this, we obtain the probability for the outcome spin up on SGM-R(7) by sum- 
ming the “joint probabilities” 


P |i: | 1)2(dcosy—bsiny)|| 


Il ws \|2 (liu! 1)2(bcosy+dsiny)||” 
I v4, ||? iT 


a vale 


? 


which yields 
Jo? + al’. 


There is therefore no effect on the statistics of the outcomes on the right-hand side. 
They are the same, whether or not a measurement on SGM-L takes place first. 

The key property we have used here is that we can infer from the “joint distribu- 
tion” the probability for the outcome on the right by summing the joint probability 
over the possible values of the left outcome. We shall learn in Chap. 12 that the 
shorthand notation for this is that “observables” commute. The commutation of the 
spin observables on the left and on the right is in this sense an expression of the 
fact that one can perform “local operations” on the quantum system, i.e., that the 
pieces of apparatus SGM-L and SGM-R are decoupled, meaning that they function 
independently of one another. 
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Chapter 11 
The Wave Function and Quantum Equilibrium 


After introducing Bohmian mechanics we described how macroscopic physics 
emerges from it, and how the microscopic wave function of a system determines 
probabilities for pointer positions. All this is based on the quantum equilibrium hy- 
pothesis. We used the notion “wave function of a system” rather loosely, without 
scrutinizing its meaning in any depth. In this chapter, we shall complete that de- 
scription and justify the quantum equilibrium hypothesis. The basic idea as how to 
approach this justification has been presented in Chap. 4. We need to show that the 
empirical distribution of configurations is typically close to the quantum equilibrium 
distribution, which has been established experimentally to be empirically adequate. 
This can be done with surprising ease, and Bohmian mechanics thus recommends 
itself as the paradigm for Boltzmann’s view of chance in physics. 


11.1 Measure of Typicality 


Equations (8.3) and (8.4) define Bohmian mechanics for an N-particle system which 
has no environment to interact with. It is an N-particle Bohmian universe. Following 
Boltzmann’s understanding of chance in physics, the justification of the quantum 
equilibrium hypothesis must begin with a Bohmian universe, huge enough to allow 
for many subsystems, so that one can form an ensemble of subsystems. This allows 
for empirical distributions and statistical testing. We start with 


(2,0% Pp?) (11.1) 


as dynamical system with 2 as configuration space, and ¥ as the wave function of 
the universe generating ®”, the Bohmian flow on 2: 


VteR, (gq) = Q(t,q) = solution of (8.3). 
P* is the equivariant measure, the quantum equilibrium measure. This means the 


following. Let % be the solution of (8.4) with initial condition '¥. An equivariant 
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measure is a measure P*, depending on ¥ in a particular way, viz., 
PY (A) = PY 0 (@)-1(A) = PY ((@")-"(4)) = P(A), (11.2) 


where the last equality expresses the condition for equivariance. In terms of expec- 
tation values of a function f: 2—-R, 


¥(#(Q@)) =E*(). (11.3) 


This means that the mapping from ¥ to P* is invariant under the time evolution 
PY — P*, Diagrammatically, 


w pY 


U; | | oo) 
pt 


Yo 


where U; denotes the evolution ‘¥ = U;¥ according to Schrédinger’s equation and 
o(®")—! stands for the flow map defining the time evolution of the measure P?* := 
P* o(@*)—! along the Bohmian flow. 

Equivariance generalizes stationarity and defines the quantum equilibrium mea- 
sure, the measure singled out by the dynamical law itself, and which defines typical- 
ity. The equivariance property ensures that typicality is time independent. The gen- 
eralization to equivariance is required, because stationarity has no meaning when 
the velocity field v¥(q,t) [see (8.1)] depends on time, since the wave function is 
time dependent. Finding an equivariant measure is potentially a hard task! In clas- 
sical mechanics the stationary measure was easy to find because the divergence 
of the Hamiltonian vector field on phase space is zero (Liouville’s theorem). No 
such property holds for the Bohmian vector field. Nevertheless, thanks to Born 
and Schrodinger, we already know the equivariant measure P¥ [see (7.11) (7.14)], 
namely 


P¥(a) = | \(@lrera, 14) 
normalized to unity, 


[iv@Prera=t. (115) 


Of course, we have no idea what the universal wave function looks like. Is it time de- 
pendent? One reason to think that the wave function of the universe is not stationary 
is macroscopic irreversibility, so that the wave function can be held responsible for 
the global non-equilibrium character of the universe. It is not unreasonable to think 
that the non-equilibrium character of the universe is encoded in a special initial wave 
function of the universe (for a discussion of typicality of wave functions, see [1]). 
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Assuming a special initial wave function allows one to separate the justification of 
the statistical hypothesis for the Bohmian particles from the issue of macroscopic 
irreversibility, which then resides solely in the wave function. 

Note, however, that the assumption of a non-equilibrium universal wave function 
is plausible, but not necessary. To understand this better, suppose the universal wave 
function were stationary, not evolving in time. How is it possible then for things 
to move around? In orthodox quantum mechanics the world would be forever still. 
Not so in Bohmian mechanics. If the wave function contains a nontrivial phase, 
Bohmian particles move around and a world like ours is still possible. We give 
a simple example of a time-evolving Bohmian world with a stationary universal 
wave function in Remark 11.1. The irreversibility we experience in all macroscopic 
processes will then have to be explained by a special configuration of the Bohmian 
particles. 

Readers who have absorbed the following sections may come back to this point 
and wonder how much of the following analysis remains valid when the wave func- 
tion of the universe is stationary. Presumably the result will no longer be as easy 
to prove, but it might be worthwhile noting that there is no reason to think that 
the set of special initial configurations which yield a macroscopic non-equilibrium 
universe like ours is also the set of atypical Bohmian configurations for which the 
quantum equilibrium hypothesis does not hold. In other words, conditioning the 
quantum equilibrium measure on the set of initial conditions responsible for ther- 
mal non-equilibrium may imply that the quantum equilibrium hypothesis typically 
holds. Since we lack a good understanding of what the wave function of the universe 
looks like, these last remarks are a subject for future research, and we turn now to 
more modest and practical questions. 


11.2 Conditional Wave Function 


Given the Bohmian universe, how does one describe a subsystem? Asking the same 
question for a Newtonian universe, the answer is clear: just apply the Newtonian 
laws to the subsystem. Thinking about this for a moment, one understands that the 
answer is based on the possibility that influences from outside the system are neg- 
ligible. If we throw a stone, it is not just the Newtonian laws of the arm giving 
momentum to the stone, but also the gravitational interaction between the stone and 
the earth which is relevant for its motion; but by all means forget the mass of the 
sun! It is too far away! 

However, in Bohmian mechanics “too far away” has no obvious meaning. The 
universal wave function is a function on the configuration space of the universe. 
What does it mean to neglect “distant Bohmian bodies”? We need to analyze this. 
The last chapter already tells us that “far away” is not the essential feature on which 
an autonomous description of a subsystem can be based. It is rather a product struc- 
ture of the wave function in conjunction with fapp-impossibility of interference. 
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We consider an m-dimensional subsystem (x-system) of particles given by their 
configuration X, and denote the n-dimensional particle configuration of the rest of 
the universe by Y, i.e., the total environment of the x-system is the y-system. The 
universal configuration therefore splits according to 


Q=(X,Y) (q=(%y)). (11.6) 


The x-system we should have in mind now is a physical system which one studies 
in a laboratory. That means that the experimenter is part of the environment. Ev- 
erything the experimenter learns, writes down, or otherwise secures belongs to Y. 
From a macroscopic point of view, we may say that we know the “relevant region” 
of configuration space in which Y lies quite well. What we do not know is the wave 
function of the universe ‘’. We need a concept to describe the x-system in Bohmian 
terms. Since we already have the Bohmian positions, all that is needed is the notion 
of the wave function for the system. 
The Bohmian equation for the x-system is 


ag VP (% ¥(4)) 


Y (x, Y(t) an 


X(t) = vy (X(t), ¥(t)) 


suggesting the definition of a conditional wave function for the x-system: 


p(x) = oe (11.7) 


with the norm 


1/2 
Ice = | f ee vPans| | 
Hence, 
X(t) =v” (X(1) . 


We obtain the conditional wave function by replacing the y-part in the configuration 
coordinate q = (x,y) of the universal wave function by the actual configuration Y 
and then normalizing. The conditional wave function is in general unknown and 
it will not generally evolve according to a Schrédinger equation for the x-system. 
However, look at the example in 11.1. 

On the other hand, the conditional wave function connects directly with the con- 
ditional quantum equilibrium measure. Suppose we would like to make a typicality 
statement about the x-system. Which measure is relevant to that purpose? Since the 
environment of the x-system is macroscopically factual, i.e., the laboratory and the 
experimenter at work in it are facts, we must condition the quantum equilibrium 
measure on these facts. 
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The conditional measure P(A|B) of A given the event B is simply the measure 
restricted to B and appropriately renormalized: 


P(A|B) = P(ANB)/P(B) . 


When we condition on a set b of measure zero, which will be the case relevant for 
us, since we wish to condition on the macroscopic facts encoded in the point Y, we 
can consider the limit of P(ANB)/P(B) as P(B) — P(b) = 0. This exists when the 
measure has a density, as is the case for 


P¥ (dx,d"y) = |W(x,y)|?d"xd"y . 
The conditional measure is then simply given by 
|W ((x,¥ y)P dx 
[\¥((%.¥)) Pane 
- |o* (x)|? dx (11.8) 


P*({Q =(X,Y),X¢ d"x}[Y) = 


In (11.8) the specification of the environment to the configuration Y is much too 
specific for the formula to be applicable in relevant physical situations. We only 
know a few macroscopic facts about Y, so the conditioning on Y seems ridiculous. 
However, we can gain a valuable formula from (11.8) by making the following 
observation. We can collect all Qs which yield the same conditional wave function 
for the x-system into a set, say 


{9” = 9} = {(x¥) € 2|9"(x) =9(n)}. 
Then use the following simple property of conditional probabilities. Let B = UB; 
be a pairwise disjoint partition and let P(A|B;) = a for all B;. Then by the additivity 


of the measure, viz., 


P(B)a = >» P(A|Bi)P(Bi) = aP(A NB;) = P(ANB), 


and hence P(A|B) = a. Therefore, 


PY ({Q=(%Y),X€d"x}|{9¥ = 9}) =loP'a"s, (11.9) 
which for ease of notation we simply write as 
P¥ (XeEd™x|9* =~) =||d"x. (11.10) 


This formula is crucial for justifying the quantum equilibrium hypothesis. We wish 
to apply it to a situation where the conditional wave function of the x-system does 
not depend (at least for a certain amount of time) on Y. That is what we believe to be 
the case in our world, i.e., that subsystems sometimes behave autonomously. This 
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brings us then to the concept of a Bohmian subsystem, meaning that the subsystem 
has its own wave function, its own Schr6dinger evolution, and hence an autonomous 
law for Bohmian trajectories. 


11.3 Effective Wave function 


If the universal wave function has a product structure, then in view of (9.5) the 
velocity field of the x-system is determined by y: 


P(x,y)=w(x)®(y) => vi =vy. (11.11) 


But (11.11) is much too special, since any interaction between the x-system and 
the environment will destroy the product structure, leading to an entangled wave 
function as in the measurement process. It is unreasonable to assume that the uni- 
versal wave function has product structure. Generically, it will be a superposition of 
products, i.e., a bona fide entangled wave function. 

But we also know that for macroscopically disjoint wave packets when seen as 
functions of the macroscopic environment configuration y (see Fig. 9.2), 


(x,y) = "A(x, y) + ‘A(x, y), (11.12) 


and only one of the packets will be effective in Bohmian mechanics, either ‘ (if 
Y € supp ‘%) or % Gf Y € supp %) [see (9.5)], and we can fapp forget about the 
ineffective packet. The idea which leads to the relevant concept of the effective wave 
function of a subsystem comes from combining (11.11) and (11.12). As already 
remarked in Chap. 8, one must take the macroscopic disjointness of the packets in 
(11.12) with a pinch of salt. It will only be approximately satisfied, for example, in 
the sense of L?, which means that 


Pep _» PY apr. 


We introduce the concept of effective wave function for the x-system, which is the 
well-defined expression of the collapsed wave function of orthodox quantum me- 
chanics. The effective wave function is the conditional wave function for a special 
physical situation. The x-system has an effective wave function @ if 


(x,y) = p(x) P(y) + ¥" (x,y) , (11.13) 
where ® and ¥+ have macroscopically disjoint y-supports and in addition 
Y € supp ®. (11.14) 


It is helpful to recall that the splitting (11.13) happens in the “measurement process”. 
As in the discussion of the measurement experiment we can fapp forget the wave 
packet + if the environment is guided by ®, i.e., if Y € supp ®. We can forget 
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it for as long as we wish, since interference is fapp impossible (until the universe 
comes to an end). In view of v?®, we see that the x-system will now be guided by 
@. If the interaction term V(x, y)@(x) ®(y) in the Schrédinger equation is negligible 
(at least for a certain length of time), the x-system and the environment will be 
dynamically decoupled, and @ will obey a Schrédinger equation on its own. The 
x-system is then an isolated Bohmian system for that period of time. To repeat, the 
conditional wave function always exists, and it becomes an effective wave function 
when (11.13) holds with (11.14). We stress therefore that the effective wave function 
is a mathematically precise concept of the collapsed wave function of orthodox 
quantum theory. 


Remark 11.1. Stationary Universal Wave Function with Random Conditional Wave 
Function and an Effective Wave Function with Nontrivial Time Dependence 


We consider here a two-particle universe with “masses” m, = m, my = M, and q = 
(x,y) € R*. Let the wave function be stationary: 


P (x,y) = Ri(xty)Ra(y)eO™ + Rs (x)Raly) , 


with real functions R1,R2,R3,R4 and R2(y) = 0 for y > 0 and Ra(y) = 0 for y < 0. 
According to the equations of Bohmian mechanics, we have 


Y(t) =\% ah 
1 = 10 M ’ 
for Yo < 0, and 

Y(t) =Yo , 


for Yo > 0. Suppose that f |R1|*dx = {|R3|?dx = 1. Then, with probability p; = 
f RS (y)dy, the conditional wave function for the x-particle is (in the projective sense) 


ae on dl (t)) 
[ fdxRi (x+¥i(t))7Ro(Ni(t))] 


hk \ . ’ 
=cR, (x+ Yo — a) erator eM) 


91 (x,t) = 1/2 


hk . 
= ¢,R (x+ Yo-— ) ad ; 
m 
for Yo < 0, and 


Wx, Yo(t)) 
[ f dx R3(x)?Ra(Yo)?| 


@2(x,t) = 1/2 = €2R3(x) ’ 


for Yo > 0. We see that the conditional wave function is random. 
Let us now slightly change the focus of the example and consider 
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(x,y) = cos [k(x+y)] elt(=-y) 


as “universal wave function”. This happens to be a stationary solution of the Hamil- 
tonian 

re woe 

2m dx2_ 2M dy? ° 


The conditional wave function for the x-system is now 
We \ ink 
(x,t) = cos (1+ = a) ei(kx-kYo+hk?r/M) 


Observing once again that the overall phase factors depending only on time are 
projective, and thus physically irrelevant, we obtain 


(x, t)=@(x,t) = C enema + i| ; 


Observe next that the conditional wave function satisfies a Schrddinger equation on 
its own, so that we may consider it as an effective wave function, namely that of a 
free particle with mass M: 


0. rae. 


in—% = ———_,@. 
ar® 2M ax” 

The examples are interesting because the Bohmian positions and the effective wave 

function of a subsystem turn out to be non-trivial functions of time, even though the 

universal wave function is stationary (see also [2] for more on the meaning of wave 

functions). | 


11.4 Typical Empirical Distributions 


We now justify the quantum equilibrium hypothesis as being the theoretical predic- 
tion for typical empirical distributions, which is known as Born’s statistical interpre- 
tation of the wave function, and which we have already addressed in Chap. 8. The 
hypothesis reads as follows. If a subsystem has effective wave function @, then its 
particle coordinates are \p|*-distributed. What does this mean? According to our 
understanding of Boltzmann’s view, we ought to know by now! In an ensemble of 
similar subsystems, which all have effective wave function Q, the relative frequen- 
cies of the configuration coordinates will typically be close to the |@|?-distribution. 
We need to prove a law of large numbers! 

Let us therefore consider the situation where the x-system consists of many sim- 
ilar microscopic subsystems x),...,Xy, ie., where x = (x1,...,Xw). Each of the 
xj-systems is assumed to have (simultaneously) the effective wave function qj. If NV 
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is not too large (not macroscopically large!), the x-system has effective wave func- 
tion 
N 
p(x1,.-.,xv) =| [ i(x) . (11.15) 
i=l 


This somewhat remarkable fact can be seen as follows. For each i we have, by virtue 
of (11.13) and (11.14), 


Y (x,y) = @i(xi) Bi(yi) + 4" (Xi,¥i) , 


where @; and sae have macroscopically disjoint y;-supports and Y; € supp ®,. But 
the x; are microscopically few coordinates and the number N of subsystems is not 
too large, so ®; and Y%+ must already have macroscopically disjoint y-supports, 
where q = (X1,.-. , X,Y). Furthermore we have 


Y € supp ®; Nsupp ®2M... supp @y . 
Therefore, for this Y and all i, 
W(x1,...,Xv, Y) = 0;(x;)®;(Y,%;) , (11.16) 
with 
Xj = (X1,-.-. ,X/-1,Xi41,--- Xn) - 


Hence let us write as an ansatz 
N ~ 
Y(x1,...,Xv,Y) = [] ox) o(vY x) 
i=] 


Division by J] 9;(x;) shows that, in view of (11.16), 
@(Y,x) = @(Y). 


So (11.15) is true. 

Let us now move on to an ensemble of N subsystems which all have the same 
effective wave function @. More precisely, we fix the same coordinate frame in all 
subsystems and @ is the effective wave function relative to that coordinate system. 
Each x;-subsystem has coordinates x;, also relative to the chosen coordinate system. 
Then by virtue of (11.15), according to (11.8), we obtain for the distribution of the 
coordinates x;,...,Xy, 


PY(X, € dxy,...,Xy € dey) = P* (X; €dy,...,Xw € dey |Y) 


N 
= Ile ¢(x;)|"dx; , (11.17) 
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where Y represents the environment of the (x,,... ,x,’)-system. This formula en- 
ables us to predict the empirical distribution of the coordinates of what we might 
refer to as a ||*-ensemble. Under the measure PY, according to (11.17), the coarse- 
graining functions (X;)j=1,....v defined on 2 form a Bernoulli sequence of ||?- 
distributed random variables. For such a sequence, we showed at the end of Chap. 4 
that the law of large numbers (4.54) holds. Adjusting that assertion to the present 
setting, it implies that the relative frequencies of x-coordinates, i.e., the empirical 
distribution of the X;,... , Xj, is close to the |@|?-distribution for PY-typical config- 
urations. This is exactly what the quantum equilibrium hypothesis says. Note that 
we acquire good information about the effective wave function, at least about its 
modulus squared, via the empirical statistics. The quantum equilibrium hypothesis, 
which is now no longer a hypothesis but a theorem, is the link (actually the only 
link) between theory and experience. 

We formulate the theorem precisely as follows. Suppose that, say, at time f the 
x-system consists of N systems with coordinates x),...,Xy (relative to the same 
frame in each system), and that the configurations are X; = (X1,... , Xv). Suppose 
the effective wave function is 


1(X) = (x1)... P(xy) , 


and let Y; = Y be the environmental configuration at that time, in accordance with 
the fact that the effective wave function of the ensemble is @;(x). Then 


PY ({ LY 10%) — [ Foolo(s) ax <eli=¥) 
Nt 


-({o £¥ f0x)— [fo] 9(x)/?ar <a} 
‘ N i= 
=1 


—&(e,f,N) , (11.18) 


and 6(€, f,N) > 0 for N — -. 

Think of f as a characteristic function 7,4 in (11.18). Then for a family 74, 
defining the relative frequencies of the measured values (€ Ag) and for N large 
enough, we have 


Y5(€,fa,N) <1. 


The bad set of initial configurations Q, for which the empirical distribution is not 
close to the quantum equilibrium value, has very small P¥-measure. There are so 
to speak only a few points Q € QY = {Q|Y, = Y} in the given environment which 
fail. 

The reader may have many concerns with this assertion. We shall address two. 
The first may be this: What is special about the equivariance property of the uni- 
versal measure P” defining typicality? Would another measure P, say the one with 
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density |¥|* (which is not equivariant), not yield typicality for the empirical distri- 
bution |@|* by the same argument? In fact, it would, but only at exactly that moment 
of time where the measure has the |¥|* density. Since the measure is not equivariant, 
its density will soon change to something completely different. If P is supposed to 
be the “initial measure” of typicality at the “initial time” when the universe started, 
then who knows what the measure of typicality will look like today? That measure 
has no special significance. Why that measure and not some other? The equivariant 
measure of typicality on the other hand is special — as in Boltzmann’s way of looking 
at statistical physics. Typicality defined by this measure does not depend on time. 
It is singled out by the physics itself. Time evolution does play a role, although 
so far we have only considered an ensemble at a single time (like tossing 10000 
coins at the same time). We shall say a bit more about repetitions of an experiment 
(ensemble in time, like tossing the same coin 10000 times) later. 

Another concern the reader may have is that the conditioning is much too strong 
and therefore irrelevant. Since we can never know what the exact environmental 
configuration Y is, we should condition on less, indeed condition only on the fact 
that an experiment of the kind we describe has been carried out and whatever else 
seems relevant for the experiment. But we have already observed in (11.10) that 
coarse-graining the conditioning to a set on which only the conditional (here effec- 
tive) wave function is given makes no difference to the right-hand side. The con- 
clusion holds just the same. Moreover, we need the assertion in the strongest pos- 
sible form, which is the way we formulated it. Indeed, we must be sure that further 
knowledge of the environment, for example concerning the history of the ensemble 
system and whatever else we may deem relevant, does not affect the conditional 
distribution, given the effective wave function. Is it relevant that the experimenter 
chose a red tie that morning? Is it relevant that his car had a flat tire on the way to 
work? Who knows beforehand? The assertion (11.18) tells us that all those details 
are irrelevant. 

It is crucial that the conditional measure P¥ of the configurations for which the 
empirical statistics deviate from |@|? should be small. Suppose we could only show 
that for the unconditional measure P’. That would tell us nothing, because the set 
of environments which are in accordance with the experiments taking place may 
already have small P’-measure, so this alone could be responsible for the smallness 
of the result. This is what happens in classical statistical mechanics. The set of initial 
conditions of the universe we happen to live in has extremely small equilibrium 
measure. Therefore, conditioning is crucial. Without it we would be empty-handed. 

We could even condition — if that were necessary, for example, if the universal 
wave function were stationary — on such special environments as could explain irre- 
versible evolution. But then, to justify the quantum equilibrium hypothesis, it must 
be the case that the set of special initial conditions .” C @ responsible for thermal 
non-equilibrium (in Remark 11.1 this could be taken as the set of positive Yo val- 
ues) is not the bad set of Qs for which the quantum equilibrium hypothesis fails to 
be true. 

We conclude this series of remarks with one more point. Suppose we forget all 
the metaphysics and say that we do not believe in all this talk about the physics 
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determining its measure of typicality. But we must still prove a law of large num- 
bers for empirical distributions, otherwise we have no link between the theory and 
experience. In this view the equivariant measure is a highly valuable technical tool, 
because this is the measure which allows us to prove the theorem! At time f, let 
us say today when we do the experiment, any other measure would look so odd 
(it would depend on in such an intricate way) that we would have no chance of 
proving anything! And if it did not look odd today, then it would look terribly odd 
tomorrow! The equivariant measure always looks the same and what we prove today 
about the empirical distribution will hold forever. 

We also remark that any measure P which is absolutely continuous with respect 
to the equivariant measure P” (P and P” have a density with respect to each other) 
defines the same sense of typicality. The observation we have just made about the 
technical advantage of the equivariant measure applies here, too. To prove the law 
of large numbers with another measure from the equivalence class of measures of 
typicality would be an awkward thing to do, since it changes its form all the time, 
and would look so odd that we would have no chance of proving anything. In other 
words, equivariance is also technically crucial! 

But now on to more practical concerns! So far our statistical analysis has been 
restricted to a spatially distributed ensemble (x1,... ,xy) at one time. But what re- 
ally happens in experiments is that they are repeated. For example, in the two slit 
experiments, one sends a beam of particles through the slit and thereby creates an 
ensemble of independent subsystems. But it is an ensemble distributed over time 
(like letting balls drop through the Galton board). One can actually handle this too 
[3], but it is definitely more complicated. Here there is a subtlety that must be taken 
into account in the analysis of time ensembles. In the universe the times at which 
the experiments are done are also “random”, i.e., functions of Q, like the spatial 
locations of the systems. 

For example, suppose an experimenter, eager to win the Nobel prize, starts an 
experiment which is supposed to measure the EPR correlations in a very fine way. 
Suppose the experiment shows in the first 100 runs that, when a particle is registered 
on the left, no particle is registered on the right. That makes the experimenter so 
upset that he destroys the laboratory in a fit of anger. No further experiment is done. 
Alternatively, the experimenter rethinks the experimental setup, finds a problem, and 
calls the repair man, only to find that he is away on holiday. The moral is that the 
times when the experiments are done are random! That must be taken into account 
in the mathematics. 

And that is not all. In the single time ensemble, all we need to look at are the ac- 
tual configurations. No measurement talk is needed. In the time ensemble, however, 
we must take into account the fact that measurements do take place. The position of 
the electron in the ground state does not change in time. For X(t1), X(f2),... ,X(ty) 
in the ground state @, we have X(f,) = X(f2) =... = X(ty). In other words, the 
random variables are not at all independent. By measuring the ground state distri- 
bution of a hydrogen atom, we disturb the state, e.g., we ionize the electron, let it 
fall back, ionize again, let it fall back and so on. It is important here that we ionize 
it and then let it fall back again. The positions after settling back in the ground state 
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are independent random variables. It is not difficult to see why this is so. Imagine 
N pointer wave functions ®),...,®y with configurations Z),...,Zy, which indi- 
cate the measured values of the positions of the electron in the ground state. At the 
end of the day we have a product wave function of pointer positions [] ®; for the 
measured values X,,... , Xv, i.e., the distribution of the measured values is again a 
product distribution, and so we have independence. Therefore the law of large num- 
bers applies as before. The principle which underlies the “many-times analysis” 
should thus be reasonably clear, even though the precise analysis is, as mentioned, 
somewhat demanding [3]. 

We have understood and justified the quantum equilibrium hypothesis. It is in 
fact Born’s statistical law for the wave function, sometimes called the Born rule, 
or Born interpretation of the wave function, and we understand that the wave func- 
tion @ one talks about is the conditional or effective wave function. The hypothesis 
concerns the empirical distribution of coordinates of particles in an ensemble. The 
justification tells us that the quantum equilibrium distribution p = |@|? is what we 
should always be experiencing. That is all. 

But some more lessons can nevertheless be learned. When we know, say by mea- 
suring the position of a particle, that the particle is in some spatial region, then we 
know by the quantum equilibrium distribution that the effective wave function will 
have its support in that region. If that region happens to be tiny, then the effective 
wave function will be sharply localized. If the wave function is sharply localized, its 
Fourier decomposition into plane waves will involve a great spread of wave num- 
bers k. Suppose the wave function now evolves freely. The plane wave packets will 
move apart due to the dispersion relation, and eventually separate. Depending on the 
exact initial position of the Bohmian particle, this particle will eventually be guided 
by one of the almost plane wave packets, i.e., it will eventually move along a straight 
line. We explained that in Sect. 9.4. 

The initial randomness of the particle position translates into the randomness of 
the particle’s asymptotic velocity, which is given by the modulus squared of the 
Fourier transform of the initial localized wave packet. That distribution is all the 
more spread out as the initial wave packet is sharply localised. This is Heisenberg’s 
uncertainty relation. Obviously, the relation is a direct consequence of the quantum 
equilibrium distribution, i.e., Born’s statistical law. 

Can we by any clever tricks whatever know more about the particle position than 
that it is |@|?-distributed when the effective wave function is @? The answer is that 
we cannot. That is what the quantum equilibrium hypothesis says, and what we 
have proven to be typical. Equilibrium, here quantum equilibrium, entails absolute 
uncertainty about the Bohmian positions, beyond the |p|?-distribution. 


11.5 Misunderstandings 


We have discussed the justification of the statistical hypothesis in classical statistical 
mechanics and in Bohmian mechanics. The hypothesis concerns the typical empiri- 
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cal distribution of the values in an ensemble. Typicality is defined via a measure of 
typicality. Can the measure of typicality be misunderstood as an empirical distribu- 
tion? Hardly so, because the measure of typicality is a measure on the configuration 
space of the universe, and since we only have access to one universe, namely the one 
we live in, an ensemble of universes is meaningless for physics. It so happens that 
the quantum equilibrium measure of the universe defining typicality has the same 
form, namely |¥|?, as the predicted empirical distribution of a system with effec- 
tive wave function @, namely |@|?. Although they look similar, they do not mean 
the same thing, since the effective wave function is a fundamentally different object 
from the wave function of the universe. 

We recall that, in classical statistical mechanics, we have the microcanonical 
measure as a measure of typicality, and the empirical distributions of subsystems 
are typically canonical or grand canonical ensembles which look different from the 
microcanonical measure. In quantum equilibrium the situation is simpler. Bohmian 
mechanics is simpler than classical mechanics, and because of that we are able to 
justify the quantum equilibrium hypothesis with great ease. The price to pay is that 
we need to be careful not to treat things which are not the same as being the same. 


11.6 Quantum Nonequilibrium 


The second law of thermodynamics captures irreversibility, and at the same time 
points towards the problem of irreversibility, which is to justify the special atypical 
initial conditions on which, according to Boltzmann, the second law is based. Atypi- 
cal initial conditions (which are synonymous with non-equilibrium) do of course ex- 
ist. So do even grotesquely atypical initial conditions, as in the “Umkehreinwand”. 
While typicality is a clear-cut concept which needs no further justification, atypi- 
cality is tricky, and we (humankind) should consider ourselves lucky that we have 
found the second law of thermodynamics. It tells us that we should not worry about 
grotesquely atypical initial conditions, and it tells us more or less how special the 
initial configurations are. 

One could nevertheless spend one’s time worrying about what very atypical ini- 
tial conditions would produce. As a believer in strong atypicality, one could sit in 
front of a stone and wait for the stone to jump into the air, because in a very atyp- 
ical world, that could happen, now, tomorrow, maybe the day after tomorrow. The 
second law tells us that there are better ways to spend one’s time, but for some 
there may still be the bitter pill to swallow, that the second law is based on non- 
equilibrium. There is no question that we do need to worry about what justifies 
these special initial conditions, although we may safely say that this is a problem 
for future generations to handle. 

The situation in a Bohmian universe is irrevocable. There is no need for a sec- 
ond law for the configurations in Bohmian mechanics. Quantum equilibrium, which 
like equilibrium needs no justification, is fortunately all we need to describe the 
empirical import of Bohmian mechanics. 
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Chapter 12 
From Physics to Mathematics 


The foundations of quantum mechanics are concerned with Hilbert spaces, linear 
operators on Hilbert spaces, unitary operators, and self-adjoint operators and their 
spectra. The foundations of Bohmian mechanics contain nothing of that sort and 
nothing of that sort seems relevant. Of course, the Schrédinger equation is a partial 
differential equation and contains differential operators, but so does the Maxwell-— 
Lorentz theory of electromagnetism, which one learns about without all those ab- 
stract notions. Why is quantum mechanics different? Why does it need to be based 
on such abstract mathematical notions? 

The quantity which determines the empirical import of Bohmian mechanics is 
the effective wave function, the “collapsed” wave packet which guides the particles. 
Its modulus squared gives the statistical distribution of the particle configuration. 
That is all. It seems a meager content. But we shall explain in this chapter why 
the statistical import of Bohmian mechanics, which seems so meager, is in fact ex- 
tremely rich. The quantum formalism in its most general formulation follows from 
it, and so does much more (see [1] for a detailed analysis). Readers who know quan- 
tum mechanics from textbooks will find this chapter to be a revelation. It prepares 
the insight needed for the abstract mathematics to be discussed in the next part of 
the book, the mathematics which is usually viewed as forming the foundations of 
quantum mechanics. 


12.1 Observables. An Unhelpful Notion 


It is natural to think that an observable is a variable that can be observed. In quan- 
tum mechanics, a self-adjoint operator is an observable. Sometimes it is said that one 
measures the observable, which would then mean that one measures the operator. 
For example, one can say that one measures the operator A. But this is clearly not 
intended to mean that one determines the area of the symbol A. So what is meant? 
In fact, something very abstract. And this is an abstraction that one should not be 
surprised about. After all, an operator on a Hilbert space is a very abstract object, 
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which has no obvious or direct relation to things going on in physical space. One 
should expect the bridge from the physics to the mathematics to be a bold construc- 
tion. But in fact it is not. It is quite the opposite, in many ways rather boring: the 
operator observables of quantum mechanics are book-keeping devices for effective 
wave function statistics. Let us explain that. 

Consider an experiment & in which a system (in our usual notation x, m- 
dimensional) and a piece of apparatus (y, n-dimensional) with discrete pointer states 
@®q become entangled. Under an appropriate Schrédinger evolution, the pointer 
wave functions get entangled with certain wave functions @q of the system: 


Schrédinger evolution 
Po(X)P(y) rr g(x) Pay) , (12.1) 
and by linearity we obtain for 
Q= ae aPa 
a 


the result 


Schrédinger evolution 
(y) — by 


p(x) Ply CaPa(X)Pa(y) , (12.2) 


a 


which means that the initial effective wave function @ = ¥cq@@q changes with prob- 
ability lcp |? to the effective wave function 9g. Why is this? Because by virtue of the 
quantum equilibrium distribution we have Y € supp ®g with probability 


2 
Y' co Po(X)Pa(y)} d’xd"y (12.3) 


a 


eae 
= leg? [lap (x)lParx [ |p (y) Pay = lepl? 


where we have used supp ®y 1 supp ®g ~ 0 for a # B and the fact that the wave 
functions are normalized to unity. According to our definition of the effective wave 
function and our understanding of the fapp collapse from previous chapters, @g is 
the new effective wave function of the system, and the wave parts involving ®y with 
o # B can be ignored fapp forever. 

The wave functions @, and probabilities |cg,|? are associated with the experiment 
& given by (12.1—-12.2), and we wish to handle both these aspects in a comfortable 
way. To identify the right way, we recall (7.11) and the ensuing discussion, which 
says that the Schrodinger evolution preserves the norm || ||? (and hence the norm 
itself), which is the integrated modulus squared of the wave function (assumed notr- 
malized to unity). The relation (12.2) then implies that 
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||? = [\e~oly)Parxary 
= i ¥ ca a(x) P 
= 7 2 ¢aPa(x) Day) 


= Yea? flea? f \aly)/Para’y 
= Yieal? (12.4) 


2 
qd” x d” y 


2 
d"xd"y 


Factoring out and integrating the right-hand side of the second equality, we obtain 
by comparison 


> cacp | edx X) p(x x) dx =0 
atB 


Since the cg, can be chosen arbitrarily, we obtain 


[ 2k0)9p(x) "x= 


for « 4 B. This is reminiscent of the notion of orthogonality when we view 


(gly) : =| wy x) dx (12.5) 
as a scalar product on the space of square-integrable wave functions. Hence, 


Ofor a#B, 


(Pal Pp) = 5,8 = iiere8’. (12.6) 


Why do we find orthogonality for the @g,? The answer is of course that By and Og 
are macroscopically disjoint pointer positions! In other words, to actually have an 
evolution like (12.1—12.2), the @g, must be orthogonal. 

Now we have the power of unitary geometry at our disposal. We can compute 
with the scalar product 


Co, = | e(x)9 x) d"x := (Qa|@) 


as the orthogonal projection of @ onto @q. Let Py, denote the orthogonal projector 
onto that direction 


PoP = Pa Pa|P) ; 
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and assume for the moment that the @_ form a basis, in the sense that we can expand 
any @ in the form 


P= > Poa - 
a 


Hence we can associate the family of projectors (Py, ) with the following proper- 
ties with the experiment (12.1—12.2): 


Poa P = PalPal) ; (12.7a) 
PoP og =0, fora#B, (12.7b) 
Pa = Poa Pon = Poy 5) (12.7c) 
LYaPog =! (unit matrix) . (12.7d) 


These properties characterize (Py,)q as a family of orthogonal projectors. We re- 
mark in passing that these projectors are self-adjoint, i.e., (Pe, y) = (@, Py), de- 
noted in this book by P* = P. 

Suppose now that the pointer points to values (numbers displayed by the ap- 
paratus) {A,,...,Ay}, and suppose that Y € supp®g means that the value Ag 
is pointed at. The experiment is thus also characterized by the displayed values 
Ag EA = {A,..., An}. The quantum equilibrium statistics translate to the statisti- 
cal distribution of the A values, and one may want to know the average value and 
variance of the displayed A values in the long run (repeating the experiment many 
times). The answer is encoded in one operator, namely, 


A= VAePox 5 (12.8) 


the quantum Swiss army knife, containing all that we need — fapp. 

To understand this better we need some mathematical facts, and these will all 
be detailed in the coming chapters on mathematics. First A inherits self-adjointness 
from the projectors. The relation between a self-adjoint operator and the family of 
projectors is one-to-one and called the spectral theorem. This is trivial from right to 
left in (12.8), but from left to right one needs some linear algebra, and in general one 
needs the infinite-dimensional version of linear algebra called functional analysis. 
The Aq are eigenvalues and the projector Py, can be defined as the characteristic 
function of A for the value Ne: 


Pog = X4q}(A) - 


We can now express “everything of interest” in terms of A. The probability for the 
value Ag, computed in (12.3), can be expressed in various ways using (12.7a—d): 
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Po(Aw) = Icol” = Pal)? 
= (9|Pa) (Pal) 
= (0, Pog) = (|X{a43(A)@) 
= (Poa; Poa) 
= ||Poa Pll” - (12.9) 


The probability of the sure event is unity: 


P9(A) = Po (Ac) = Y(9|Poa?) 


a 
= (o|¥ Po.) =i, (12.10) 
a 
The mean value of the A values is 


Lo(A) = Y AaPo (Aa) 


= Y A0(P|Poa) 


=(1Z4ro) 


= (9|AQ) , (12.11) 


and the variance of the A values is 


1p(A?) = SAPs (Ac) 


a 


~ (o|ZAat Saar [by (12.7b)| 
a B 


yAeP .9) 
a 


= (g|A’Q) . (12.12) 


We emphasize the use of (12.7b) in the computation of the variance. It makes 
the book-keeping operator A in (12.8) technically powerful, as a result of its self- 
adjointness, if one treats the operator as a priori. From the way in which A arises, 
it is automatically self-adjoint, because the display is made up of real numbers. If 
someone put imaginary units in front of the numbers in the display, the values would 
be imaginary and the book-keeping operator would no longer be self-adjoint. 

Let us collect together these results. We may associate with the experiment & 
a family of orthogonal projectors Py, and values Ag € {A1,..., An}, both encoded 
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in A through (12.8). In short & —> A, where the association denoted by the arrow 
means that we can use A to compute the statistics of the values when the experiment 
is repeated many times. Now we may understand why the idea of an operator ob- 
servable has become such a dominant notion in quantum physics. In the experiment, 
pointers point to numbers and a situation where pointers point to numbers has, of 
course, measurement appeal. 

Now, turning things upside-down, any self-adjoint operator A (think of it as a 
Hermitian matrix right now) uniquely defines a family of projectors P»,, namely the 
projectors onto its eigenvectors, and a set of values, namely its eigenvalues. Now call 
the values “measured values”, or again “measurable values”, and call the operator 
“measurable”. The experiment & with which A is associated can be referred to as 
the “measurement of A”. So there you have it. Confusion is programmed, since each 
A is now an “observable” and hence has a life of its own. But are all self-adjoint 
operators observables? If not, then which ones are? Is there a “classical” hidden 
variable behind the observable whose value is really measured? And in this way, 
many irrelevant questions arise. 

An example for (12.1—12.2) and its association with an operator A is provided by 
our discussion of spin. Suppose the spinor wave function is 


( yi (x) ) 
Wa(x) } 
in the eigenbasis of o,, and suppose the Stern—Gerlach magnet is oriented in the 


a-direction. Then the wave function will split into two wave packets $+ and @_, the 
eigenfunctions of a-o, as the example shows. The particle will be in the + packet 


with probability 
2 2 
/o4p(28)/ le) 
~ \ wa(x) Vy / ||” 


where P# denotes the projector onto the corresponding spinor component. The as- 
sociated operator is simply 


Ay=+5Pt-5P = jac, (12.13) 
the spin operator in the direction of a, ||a|| = 1. 

This example reveals an interesting feature, namely that we did not have to talk 
about measurement apparatus in order to get the orthogonality of the possible effec- 
tive wave functions. The Schrédinger evolution of the particle through the Stern— 
Gerlach magnet has already taken care of that. In the end one only needs to detect 
which of the wave packets the particle is riding along with. This adds something to 
the false intuition that there is a genuine quantity behind the observable “which is 
measured”. 

The moral of the foregoing is that one can associate operators with experiments 
as book-keeping devices for the statistics, where an “orthogonal” splitting of the 
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wave function often comes for free. The apparatus is not even needed for that. In the 
end, the apparatus detects where the particle sits. 

Let us summarize the mathematical structures that emerge from the statistics. 
The space of wave functions will be a linear space (reflecting the linearity of the 
Schrédinger evolution!) with a scalar product. A Hilbert space # will therefore be 
the most convenient setting. The space will in general be infinite-dimensional, as 
it contains the set of wave functions on configuration space. The set of values A 
which the pointer points at has so far been chosen as a discrete set (but that will 
change in the next section). For simplicity, we considered above the situation where 
one eigenvector @ corresponds to each value Aq. In general this will not be so, 
because in general the orthogonal projector corresponding to a value A, which we 
denote by P,, will not be one-dimensional, but rather will project onto a higher-, 
even infinite-dimensional subspace .#4 which contains all wave functions correlat- 
ing appropriately with the pointer position, i.e., which lead to the displayed value A. 
The (Pi )zca form a family of orthogonal (and hence self-adjoint) projections. 

The statistical outcome of the experiment & can be encoded in the operator 


A= > AP, , wih Y R=, (12.14) 
AEA AEA 
or in short 
==> CPi => Aaya. (12.15) 
AEA 


We shall focus below on the first arrow. The second arrow is mathematics, the sub- 
ject of the so-called spectral representation of a self-adjoint operator. The backwards 
arrow is mathematically somewhat difficult when the operator A is unbounded and 
has a continuous spectrum. Such operators will be discussed next. 


12.2 Who Is Afraid of PVMs and POVMs? 


We showed in the previous discussion of the experiment (12.1—12.2), where pointers 
display values, that the quantum equilibrium distribution for the probabilities of the 
values may be conveniently computed from a family of orthogonal projectors. The 
orthogonality of the projector arises in general from the orthogonality of pointer 
wave functions, since they have disjoint supports in configuration space. But the spin 
example shows that orthogonality may sometimes come without invoking pointers. 
The apparatus merely detects the position of the particle. In fact, from a Bohmian 
perspective, pointers are not needed to create facts or values. The Bohmian particle 
has a position, and one only needs to detect where it is. So sometimes the apparatus 
does not need to be mentioned at all, if all it does is to detect the actual state of 
affairs. 
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From a Bohmian perspective we should therefore formulate the experimental 
situation in much more general terms. A system (configuration x) which may get 
coupled to some apparatus (configuration y) defines (together with the apparatus, 
if the apparatus plays some role) the configuration space 2. On the latter, one has 
a coarse-graining function F : 2 — A which maps configurations to the displayed 
values (e.g., pointer positions) in A. Quantum equilibrium determines the statistics 
of the values. Therefore the most general formalism encoding quantum equilibrium 
and the statistics of measurement experiments emerges from the sequence 


@(x) 
i system couples (possibly) to apparatus 
P(x, y) = p(x) P(y) 
| Schrédinger evolution 
Yr (x,y) (12. 16a) 
I quantum equilibrium distribution 
pit =|¥(x,y)|? (12.16b) 
| and we are only interested in 
Po(A):=P*(F-1(A)), ACA, (12.16c) 


where single arrows denote linear maps and the double arrow denotes a bilinear or, 
more correctly, a sesquilinear (because of the complex conjugation of one of the 
factors) map. The “possibly” on the first arrow expresses the possibility that there 
need not be any coupling to a piece of apparatus, so that we focus only on the system 
configuration. As always, T is the duration of the experiment. 

Before we become more abstract, let us discuss a few examples. In (12.1—12.2), 
we have 


F(x,y)=F(y)€ {A1,--.,Aw}=A, 
and in view of (12.9) 
p¥r (F—!({A;})) =Po({Ai}) . (12.17) 


In the Stern—Gerlach experiment, 
F(x,y) = F(x) € 
x,y) — F(x —x=,=p. 
iy 2 ? y) 


The most direct and simple example, however, is the position of a particle x € R? 
with effective wave function @, without apparatus, and with T = 0, i.e., Pr = Q. 
Taking F = id on A = R®, the sequences reduce to 


g(x) => |g(xX))?, or Pe(dex) =|e(x)|?d?x. (12.18) 
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We would like now to shift the emphasis from the meaning of the family of pro- 
jectors introduced in the previous section towards the idea of a projection-valued 
measure (PVM), a notion which comes naturally with (12.16a—c) and which com- 
bines discrete values with continuous values, as in (12.18). From the sequence, we 
read the bilinear map from “wave function space” to measures. In the case of (12.1- 
12.2), the measure is a discrete projection-valued measure (PVM) on subsets of the 
value space A. Using the rules of probability and the orthogonality of the projectors 
corresponding to different values, (12.17) implies 


Py ({Acy s+» sAay}) = PY (FO! ({Aoy +++ Aon) 
= P¥ ({(x,y)|F(%y) € (Aas Aa} }) 


PNT (FO ({Aai})) = XPP((ai}) 


L 


= Yl Pia, Pll? = LP lPia, 9) 


i 


= (ol? hay, P 9) = (P|P racy on} ®) - (12.19) 


Hence to every subset of values {Aq,,... ,Ac,, } corresponds a projector 


indexed by that set. Hence we can view this as a family (P4) 4-4 of projectors which 
acts like a (discrete) measure on the subsets of A : a PVM. 

Since we are used to thinking of a measure on subsets of a continuum, the ab- 
straction to the PVM structure becomes evident if the value set is continuous. So 
let us look at the position of the Bohmian particle (12.18), where the value space is 
now the continuum A = R?. In this case F = id and 


P9(A) =P°(F-'(A)) = f xalo Pax 
= (9|xa9) 


7 | (91X(a8x}P) - (12.20) 


What replaces the projectors P,? Obviously, Xa3x} takes on the role of P,. This 


suggests defining the continuous PVM O of (measurable) subsets of A (= IR?) taking 
values in the space of orthogonal (self-adjoint) projectors: 


x), x€ACR, 
0420-0) =| | (12.21) 


0, otherwise , 


so that 
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P9(A) = (1049) = | |o(x)/Pa°x (12.22) 
comes out just right. The orthogonality 
Os Op (x) = Xa(x)XB(X) P(X) = Xanw(x)(x) =O, (12.23) 
for AM B = 0 is obvious, and so is the projector property 
O.=0,, (12.24) 
and the normalization 
Op =!, (12.25) 


the identity operator. 
In Dirac’s shorthand notation, we write 


(x/¥) = y(x) (12.26) 

and 
dOx = |x) (x|d?x , (12.27) 

with 
(x|x’) = (x —x’) (12.28) 


representing the orthogonality of the projectors, while 


/ |x) (x|d?x =I (12.29) 

represents the normalization, often referred to as the completeness relation. We put 
: eo [ x00. a [x (x|d3x . (12.30) 

Therefore with the measurement of position (if one cares to detect where the par- 
ticle is) one can associate a hatted position X, an observable which encodes all the 


statistics. As in (12.12), we can compute the variance of the position distribution, 
ViZ., 


5? (X?) — E?(X)*, 


by virtue of (12.28), whence 
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:°(X2) E9(X)? = [¥lecoreas— (iors) 


= [¥{g\a0.9)- (J xeldo9)) 


= (9|X) —(9|X@)” , (12.31) 


using (12.28) to obtain the last equality. 
A related and rather important example dealing with a continuum of values is 
provided by the asymptotic velocity and its distribution, which we calculated in 


=f) eGa Gi) & 
ie Cry G) 
={#(5 k) [or k) Pk. (12.32) 


Taking f = ya, we can rephrase this in the sense of our sequence (12.16a—c), choos- 
ing T =t, a time-dependent coarse-graining function 


and no apparatus, i.e., Y%’ = @,. We then obtain 


5 (yza(Vu.)) = lim P® (F- A))= [2s (7) joao’ 


t—0o 


= (OlXma/nP) = (QF —| Xa nF F 0) 
=: (@|Vaq) , (12.33) 


where V is also a PVM, since ¥ acts isometrically on the Hilbert space of square- 
integrable functions (see Sect. 13.1.2). 

Let us rephrase this in a less prosaic way. Let us recall what a measurement 
experiment for the asymptotic velocity might look like. Prepare a particle with a 
wave packet around here, let it evolve freely for quite some time, and catch the 
particle on a screen or with some other appropriate detector far away. Then take 
the ratio of the distance traveled to the time taken, and run the experiments many 
times to get the statistics. |, as we computed from the quantum 
equilibrium distribution in Chap. 9, after (9.25). 

To recall the simple explanation for that, let us remember what happens during 
the free evolution. The wave packet @ will spread according to the dispersion rela- 
tion. Why is this? It is because the packet @ is composed of “plane wave packets” 
with wave numbers k, and each such plane wave moves with velocity ik/m. That is 
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the meaning of the dispersion relation. Hence the plane wave packets separate and 
after a long time occupy separate regions in configuration space. That is, they be- 
come orthogonal. Therefore the orthogonal plane wave parts e!* will thus replace 
the @q in the discrete examples. 

Dirac invented the notation |k) (k| [already used in (12.26)] for the projector onto 
the plane wave (x|k) = e**, meaning that (k|p) = @(k), the Fourier transform of 
g. As for position, a subset of A = R? gets mapped to a projector. In fact, A C R* 
is mapped to the orthogonal projection given by 


h 
Va = [x (2x) |k) (k|d?k = i |k) (k|d?k . (12.34) 


The probability for the asymptotic velocity to lie within A when the wave function 
is Q is 


P°(V..€ A) = (@lVa@) = | 


(olk)(klo)a’k= | (elk) Park. 
mA/h mA/h 


When A = R?, we have Vig3 = |, the identity operator. In terms of the sandwich with 
a wave packet, this is simply Plancherel’s identity 


(9lVas9) = f (olk)Pa = [ |o(x)Pabx=1. 
R3 R3 
The orthogonality is again captured by 
(kIk’) = 5(k-K) , 


and one has Vp3 = I, the so-called completeness relation: 


| |k) (k|\d°k = 1. 
R3 


We find that the asymptotic velocity can also be associated with a hatted observable, 
namely 


F h 
Vee / —k\k) (k|dek . (12.35) 


Multiplying this by m, the mass of the particle, we get a hatted momentum, usually 
called the momentum operator. 

Mathematically, this is nothing but a kind of diagonal representation. Indeed, 
a PVM is (in functional analytical terms) the spectral resolution of a self-adjoint 
operator. There is a one-to-one correspondence between PVMs dP, and self-adjoint 
operators A. The former defines a self-adjoint operator A = fAdP, and the latter 
defines a projection-valued measure via its spectral representation. All this, and all 
the examples above, will be put on a mathematically rigorous basis in Chap. 15. 
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Let us make a historical remark. The move from classical variables to opera- 
tor observables was Heisenberg’s invention to explain the discreteness of atomic 
spectra. But he did so without Schrédinger’s equation and quantum equilibrium. 
He postulated that this abstract kind of observable is the new measurable quantity. 
Many physicists hoped that some “ordinary” quantities would underlie these ab- 
stract observables, and that whilst these quantities remained hidden, they could be 
held responsible for the outcome in a measurement. 

Bohr on the other hand held the clear and correct view that, in an experiment 
which “measures an observable”, nothing is actually being measured. We can un- 
derstand why that is correct. The role of the observables is just a book-keeping role 
for the statistics of an experiment. More generally, we may rewrite (12.15) using the 
notion of PVM, which can be a discrete measure as in (12.15) or a continuous mea- 
sure like Lebesgue measure, to highlight once again the association of an operator 
with the experiment (12.16a): 


G => (AjdR) —& A= | nap, (12.36) 
A 


But sometimes something is measured: the Bohmian positions. 

Note in passing that the position observable and the momentum observable are 
non-commuting! This is easily seen, since |k) (k| does not commute with |x) (x|. But 
this is no world-shattering discovery, for there is absolutely no reason why it should 
be the case. And that the variance in a position measurement of a particle with 
effective wave function @ is roughly inverse to the variance of the asymptotic ve- 
locity of the particle is also rather unexciting. Or better, it is trivially so in quantum 
equilibrium, once one understands the dispersion of waves. In short, Heisenberg’s 
uncertainty relation is wholly unsurprising once quantum equilibrium has been un- 
derstood. 


Remark 12.1, On the Dirac Formalism 

In the Dirac formalism one uses bra (-| and ket |-) symbols to denote dual vectors 
and vectors, as in (12.5). But the symbolism becomes technically powerful when the 
same symbols are used in the PVMs, as in (12.27). One must be aware, however, 
that |~) denotes a vector in Hilbert space which has a finite length ,/(@|@), and that 
|x) has no such meaning since (x|x’) = 6(x—x’). But then, once one has that clear 
in one’s mind, it does not hurt to think heuristically of |x) as a wave packet which is 
“highly” peaked at x. | 


Now we come to POVMs. What is the general abstract structure emerging from 
(12.16a—c)? At the end of (12.16c) stands a positive measure (a probability measure) 
on A, and that is built from a bilinear map acting on wave functions. In other words, 
wave functions are mapped (sesquilinearly) to positive measures. Without the wave 
function sandwich, the measure is operator-valued. Nothing says that the operators 
must be projectors, but they must be positive operators. A positive operator is one 
for which the sandwich with a wave function is always a positive number. We denote 
the POVM simply by dP), as a measure on A. PVMs are special cases of POVMs. 
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Thus the most general structure one can infer from the experiment (12.16a) is 
& => (A,dP,), (12.37) 


without the further double arrow to a book-keeping operator. We shall see from the 
following example why there is no point in introducing a book-keeping operator 
when the POVM is not a PVM. Imagine a detection of position. Realistically, such 
a detection will come with a measurement error. How does quantum mechanics 
handle that with the position observable? 

We know how to deal with this trivially in Bohmian mechanics. Let p(x) be 
the probability density on R* which describes the error due to the apparatus. The 
measured position X is therefore the sum of two random variables X + Y, with X 
distributed according to |@|? and Y distributed according to p. It is reasonable to 
assume that X and Y are independent, which implies that the distribution of X is the 
convolution! 


= | rx-ylowy)Pery. 


The probability is therefore 


°(K <A) = | 50x atx = ff pex- y)|@(y)|?d"yd"x 


=: (p|04®) , (12.38) 


where we have introduced the POVM 


Oa= | plx—y)xalx)a"x 


On: o— | pix—y)atx oly) . (12.39) 
In general, 
O74 On, 


where equality holds only if p(x) = 6(x), i-e., when the POVM is a PVM. 


' Consider the Fourier transform 


= i(e*®) =E (e+ +) = n(e**) z(e4*) =|), 


where we use the independence of X and Y to obtain the third equality. Now recall that the Fourier 
transform of a product is a convolution. 


Dv 
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Suppose we want to know the variance of X and compare that with (12.31). Then 
computing the second moment, we obtain 


(X°) = [x6(x)a"x= [xX (pld0x0) 
= [[Prx-ylow)Paryars, (12.40) 


and that is all there is to it. Compare this with (12.31) to understand why the intro- 
duction of an operator observable serves no purpose. 


12.2.1 The Theory Decides What Is Measurable 


It has been said that a theory must only be about measurable quantities. But that 
is not a very intelligent thing to say. How does one know beforehand, before the 
theory is built, what is measurable and what is not? If the theory does not contain 
an electric field, the electric field is not measurable. If it contains an electric field 
then obviously the theory will have to tell us whether and how the electric field 
is measurable. It is the theory which reveals the world to us in notions that are 
particular to the theory, and which make known to us the correct elements of the 
world, if the theory has the appeal of beauty and elegance. Recall the discussion 
on Wheeler—Feynman electromagnetism. In this theory, there is no electromagnetic 
field, so nothing of that kind could ever be measured. In Maxwell—Lorentz theory, 
there is. It is true that in these examples the variables entering the theories turn out to 
be measurable, but measurability was not the key consideration when constructing 
the theories. It turned out that way, accidentally as it were. But it need not be the 
case. Here is an example. 

Bohmian mechanics contains variables which are not measurable in the sense of 
the experiment (12.16a). There is no apparatus measuring wave functions and no 
apparatus measuring the Bohmian velocity. The fact that there is no apparatus mea- 
suring wave functions has already been said, and the fact that there is no apparatus 
measuring the Bohmian velocity (and the trajectory) is of course related to this. But 
beware! Any Newtonian motion, i.e., classical motion which we see and measure, 
for example an apple falling from a tree or an electron in a cloud chamber, is a 
Bohmian motion, albeit one where the particle moves with the classically moving 
localized wave packet. Some care is therefore in order with such statements, and it 
is important to be clear about what is meant. 

The question intended is as follows: In a situation where quantum mechanical 
interference acts, can one measure the actual velocity of the particle? Here is a 
quick and very general argument based on POVMs which answers in the negative. 
The distribution of the values of the velocity must be given by the sesquilinear form 
(quantum equilibrium distribution) 


P(A) = sesquilinear form(y)(A) , 
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on the subsets A C R*. For sesquilinear forms, we have the binomial formula esti- 
mate 


p¥ito¥(4) < 2PM! (A) + 2P¥2(A) , (12.41) 


where |a@| = 1. Now take two real wave functions yi, w2, and @ a complex phase 
factor. Then yj + & 2 will generate some velocity field that is not everywhere zero, 
while on the right-hand side the probabilities are concentrated on zero. So for a set 
A not containing 0, we get a contradiction since the right-hand side is zero. 

What should we conclude? That Bohmian mechanics is not a physical theory, 
just because of the rather foolish and unsubstantiated claim that a theory must only 
be about measurable quantities. Of course, we could not conclude in that way. Fur- 
thermore, one should not go too far with all this formalism. Suffice it to say that we 
sometimes know the velocity even without measuring it.” In the hydrogen ground 
state, the electron is at rest. Measuring its position we know where it was and what 
velocity it had. And if the reader now points out that we only know the velocity 
because the theory tells us what it is, then she or he has understood. It is always 
the theory that tells us what exists, whether what exists is measurable, and if it is 
measurable, how to measure it. The reader should revisit the double slit experiment 
in Chap. 8 in order to understand that this provides another example of the maxim 
that it is the theory that decides what is measurable. 


12.2.2 Joint Probabilities 


We shall shortly discuss a sequence of measurements, performed one after the other, 
and we do so in the case of a discrete PVM. Suppose we have two pieces of mea- 
surement apparatus with pointers ®y, a € J, and ‘Yg, B € J, which point to values 
Ag € A and Hg € IT. How is this described? As always, of course. We simply have 
a slightly more complicated Schrédinger evolution (12.2), with the only difference 
that it ends in 


Y Papa ; (12.42) 
acl Bet 


where we keep the new effective wave functions Qg g := Pg Po, non-normalized for 
reasons of notational simplicity. As a consequence, no Cy g appear. 

Repeating the computation (12.3), measuring first with the @ apparatus, we see 
that the values No; Mp come with “joint probability” 


Po(Aa, Lp) = ||PaPa@|l° - (12.43) 


2 One can, however, measure the Bohmian velocity in a so-called weak measurement [2]. 
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This is all very simple, but there is something malicious going on, as indicated by 
the quotation marks on “joint probability”. Let us now bring this to light. 

When we wish to interpret the left-hand side as the probability distribution of 
two random variables, it should be the case that if we ignore the outcome of one of 
the measurements, i.e., if we sum over all values of one variable, then we should get 
the probability of the other variable: 


Y Po(AcsHe) =Po(ug), > Po(Aa, tp) =Po(Aa) - (12.44) 
Bes 


ael 


This is pure logic applied to the joint probability.> Summing over all values of one 
random variable amounts to ignoring it, and we are only left with the probability 
distribution of the other random variable. But is this true? Well, it is true if the 
following computation can be carried out: 


\|PsPoP||? = (Pp Pa, PPaP) = (P, (Pp Pa)* Pg Po) 


(p 

= (0, Po P3* Pg Pa) 

= (0, PoP3PgPa®) 

= (0,PoPpPa) 

= (9, PgPaPa®) = (P,PpPa®) - 


This computation uses self-adjointness of the projectors P* = P, the projector prop- 
erty P? = P, and the commutativity 


[Po Ps] = PoPg — Pp Po = QO. (12.45) 


Now we can sum over either of the values in (12.44), using (12.7a). But we must 
note that the requirement (12.45) is very special and by no means natural. Think of 
the position and asymptotic velocity PVMs. We have already remarked that they do 
not commute. 

For a discrete example, we may take the “spin measurement” in some direction 
a, with which we may associate the operator Ag. One easily checks that [Ax, Ay] #0, 
whence their PVMs do not commute. We note, however, that the spin observables 
discussed in the EPR-Bohm setup a-o"!) x | and | x b- o) do commute, a fact 
which is basic to the insight that, due to quantum equilibrium, Bohmian nonlocality 
cannot be used for faster than light signalling. If the PVMs do not commute, then 
(12.43) is all there is, and 


Po(Aw, Mp) = ||PsPa@||’ ~(~,PsPa9) in general. (12.46) 


Summing now over a, (12.44) does not hold in general. 


3 For more than two entries, one speaks of a consistent family of joint distributions, where the 
(n—k)-point distribution (with n—k entries) of the family is given by the sum over k entries of the 
n-point distribution of the family. For two random outcomes this is (12.44). 
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Suppose the observables corresponding to the PVMs are A, B, and suppose we 
first “measure” (the reader hopefully understands by now what the notion “measur- 
ing an observable” means) A= Ya AaPo and then B= xp Lg Pg. Recall now a basic 
fact of linear algebra, where one learns that two Hermitian matrices have a common 
eigenbasis if and only if they commute. The same goes here, i.e., A and B commute 
if and only if their PVMs commute. If A and B do not commute, the sequential mea- 
surement of A, B cannot be described by a PVM and thus not by an observable. But 
it is a measurement! A measurement which is not a measurement of an observable? 
Disaster? Of course not. It is described by a POVM. 

The moral is this. In general, i.e., in the case of non-commuting observables, 
the probability formula (12.43) does not define a joint probability. Only commuting 
observables have joint probabilities. Is there anything deep in all this? Well, no, there 
is not. Any measurement experiment (12.2) channels the system’s wave function 
into orthogonal pieces, thereby leading to a new effective wave function that will be 
the input for the next experiment. Then summing over the possible values of the first 
experiment does not undo the physical change which the systems underwent during 
that experiment. 


12.2.3 Naive Realism about Operators 


The notion of observable and the wording “measurement of an observable” have 
lured quite a few physicists into thinking that a measurement of an observable which 
results in pointing out a value reveals the actual value of a variable, a variable which 
for some reason is, however, not yet part of the theoretical description, i.e., a hid- 
den variable, which is merely represented by the observable. We referred to this in 
Chap. 10 in the context of the EPR argument as measurement of a preexisting value. 
Naively, the observable describes a factual property of the system which is actually 
being measured in an experiment. The following question then arose: Could hidden 
variables be responsible for the outcome of measurements? 

How could one phrase this question in mathematical terms? One way might be 
to ask: Is there a map from observables to random variables which is such that 
the joint statistics of any family of commuting observables are preserved, i.e., the 
corresponding family of random variables has the same joint statistics? The answer 
here is negative. There is no such map. This is sometimes dramatically referred 
to as a no-go theorem. The name “theorem” suggests that it might involve heavy 
mathematical machinery. However, it does not. Actually, the nonlocality proof in 
Chap. 10 is an example. We shall give two arguments which make it clear why the 
no-go theorem is nothing but a simple fact. The first is technical and throws light 
on the strategy of proof, while the second is obvious and shows that one should not 
waste time trying to prove the obvious. 

Take observables A, B, and Cc , and assume that A commutes with B and B with C 
but that A does not commute with C. Then we have joint probabilities for A and B 
as well as for B and C , but not for A and C. Do such observables exist? Yes they do. 
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An example is the spin observable a- o in the a-direction, which does not generally 
commute with b- o, the spin observable in the b-direction. Then as in an EPR situa- 
tion, we consider the two-particle spin observables a - o\) @land|@b-o), These 
commute trivially. But c-o!) @1 does not in general commute with ao) QI. 
Now random variables X,4, Xz, and Xc¢ always have joint probabilities, no matter 
what. That is the trivial conflict on which the no-go theorems are based [1, 3]. 

Trivial it is, but food for mysticism nevertheless. One measures B jointly with 
A, which can be done since the PVMs commute, or one measures B simultane- 
ously with C, and one says that the properties which these observables represent are 
“contextual”, that is, they depend on the context in which they are measured. On 
the one side observables, contextual properties, “non-classical” logic, complemen- 
tarity, wave—particle duality, uncertainty, intrinsic probability, cat paradox, no-go 
mysticism. And on the other side, the two equations defining Bohmian mechanics, 
governing the whole of the (non-relativistic) world. Which side should physics be 
on? 

We promised a second argument which makes all thoughts about hidden vari- 
ables obsolete. It goes as follows. The observable is a book-keeping device for the 
quantum equilibrium statistics in experiments. The observable is associated with the 
experiment, i.e., there is a map 


E=> A. 


The experiment is the “real thing”, even for a quantum hardliner. That, if anything, 
must be real. There is a map from reality to the observable. Can this map be inverted? 
Well, no, of course not, because the map is many-to-one. How would one ever come 
up with the idea that the association of the book-keeping device with the experiment 
could be a one to one correspondence? Only if one thinks that the experiment is truly 
a measurement of the observable. But who would be so naive? 


Remark 12.2. Measuring the Position Operator 

Measuring an operator means doing an experiment, the values and the statistics of 
which are encoded in A, and in particular in the PVM: & => A. We gave as example 
the trivial position PVM of a particle and the corresponding position operator X. 
Now “measure” X. Does that mean that one measures the position of the particle? 
No, not by any means. Why should it mean that? It simply means that we carry out 
an experiment whose values and statistics are those determined by X. We give an 
example in Remark 15.4. a 


12.3 Schrédinger’s Equation Revisited 


Should we talk about existence and uniqueness of solutions of the Schrodinger equa- 
tion? The answer is that we should not, unless something really catastrophic might 
happen. But the Schrédinger equation is linear, hence rather boring (see Remark 7.1 
on that). What would be bad in this context? Well, we would be in difficulty if the 
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quantum equilibrium distribution p = | y|* were to be called into question. For recall 
the continuity equation (7.17), 


alwP? ah 
=—-V.j¥ = —V-(w*Vw—wVv"). 
54 c= 
The integral form of this guarantees that the probability always remains normalized 
because, by Gauss’ theorem and assuming that the wave function goes to zero at 


spatial infinity, we have 


2 
fe dix= — [v.jMa'x= - [i"ao=0. (12.47) 


But does (12.47) hold without further requirements? Let us consider a simple one- 
dimensional example, namely a “free” particle moving along the half-line x > 0. 
This means that we consider the Schrddinger equation on the half-line: 
a ree’ 
ih— 


cya av, xe (0%), (12.48) 


and we read (12.47) on the half-line. For notational simplicity, we now put f/m = 1. 
The solution of (12.48) reads [see (5.9) with D = | and it instead of f] 


= l (x—y)* 
ven = [yer |i" | wo). (12.49) 


We start with y(y,0) = Wo(y) as a function which has compact support on the pos- 
itive real line R*, so that wo(y) = 0 for y < 0. But the solution (12.49) will not be 
zero for x < 0 (actually as soon as t > 0). This is easily seen for large ¢ by recalling 
(9.25), viz., 


1 is/are (=) 
x,t) & e -), 
Ua OG 
where Wo is the Fourier transform of wo. This will not generally be zero for x < 0, 
since the Fourier transform of a compactly supported function is analytic. 

We can also compute the current j“(0,t) through the origin for ¢ large. We see 
then that, for some time, 


[eso Par < [ Iyota) 


This means that (12.47) is false, ie., j¥(0,t) 4 0. The lesson is that equation (12.48) 
does not suffice to capture the idea of a free particle moving along the half-line R*. 

What is missing are the boundary conditions at x = 0. In this simple example 
the boundary conditions are immediate. The particle trajectory must not cross the 
origin, i.e., 


2dx=1. 
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vy 


v ~ Im— (x,t) =0, at x=0, 
yv 


which means that 


or 
Vy(0,t)=ay(0,t), aeR, (12.50) 


seems “good” as boundary condition. For each choice of a the condition is linear, 
so that it holds for superpositions of wave functions, each of which satisfies the 
boundary condition. This is a good point, but one still needs to check that the time 
evolution respects the chosen boundary condition. Given a € R, we require that, at 
1=0, 


Vyo(0) = ayo(0) . 


We then show that, for y(x,t), a solution of (12.48) with y(x,0) = Wo(x) satisfies 


Vy(0,t) = ayw(0,r) . 


More important than this, once we understand that boundary conditions are needed, 
we understand that the solution of (12.48) with a given initial wave function is not 
unique. When the support of Wo(y) is away from y = 0, Wo(y) does not feel the 
presence of the boundary. Then there are arbitrarily many solutions w(y,t) of (12.48) 
with y(y,0) = wo(y). Here is another, different from (12.49): 


! ea (x+y)? 
slo |-m 
for which we actually have y(0,t) = 0 for all t > 0. 

Putting this together we see something nice emerging. The problem of boundary 
conditions, which we need for the quantum equilibrium hypothesis to hold true for 
all times, goes hand in hand with the existence and uniqueness of solutions of the 
Schrédinger equation. Mathematically, the quantum equilibrium hypothesis requires 
wave functions to remain normalized during the time evolution. The norm of the 
wave function is, according to the geometrical picture we have developed so far, the 
unitary one: 


vat) = fay 


Iwlls= Viwwa == yf [ wlaywilaera 


The time evolution must therefore be given by a group of unitary operators, gen- 
erated by the Schrédinger equation. This is all captured in the self-adjointness of 
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the differential operator on the right-hand side of the Schrédinger equation, the so- 
called Hamilton operator. 

A related remark is this. We shall see that each choice of a € R is a good boundary 
condition. This means that for each choice a € R, we have the physics of the free 
particle moving on the half-line. In other words, the phrase “free particle on the 
half-line” does not describe a unique physical system. Only the boundary conditions 
define the physical theory uniquely. 

The reader may well get the impression that we are insisting on an artificial sce- 
nario of a particle moving on a half-line, something which will never occur like 
that in nature, since one needs a potential to constrain a particle, and a boundary 
condition is merely an idealization. This is correct, but the example is very infor- 
mative and only prepares the ground for the more realistic physics of atoms. The 


Schrédinger equation for a particle in a Coulomb potential V (x) = —e1e2/||x|| reads 
ra) i e412 
ih - 0 
Bawa (ata) ee a#e 


i.e., it is only defined for x 4 0. There is nothing one can do about that. The origin 
is a no-go point. We do not want the particle ever to reach the origin, otherwise it 
might vanish there, and that would be the end of quantum mechanics as we know it. 
The singular point x = 0 is a boundary point of the physical theory, and boundary 
conditions are therefore needed. In three dimensions, we can consider the radial 
motion of the particle, and we do not want the radial trajectory to hit zero. We see 
that our half-line example is not so artificial after all. 


12.4 What Comes Next? 


We are ready to go on with the mathematics. We need to give a proper description 
of the vector space of functions which contains the physically relevant wave func- 
tions. We shall introduce the scalar product and with it a norm, and for technical 
reasons we shall want a complete vector space. This is the mathematics of Hilbert 
spaces. We also need to give a proper description of the (boundary) conditions under 
which the Hamilton operator generates a unitary evolution, so that quantum equi- 
librium will be valid. We shall give a more precise introduction of POVMs and 
PVMs, which describe the empirical import of quantum equilibrium, describing the 
relations between PVMs and observables so that we may link them to the textbook 
quantum formalism. 

The one-to-one correspondence is called the spectral theorem. This is techni- 
cally very important because it incorporates the diagonalization of the operators and 
allows one to compute formulas. We will then have a complete grasp of the free 
Schrédinger evolution and understand the structure of the spectrum of Schrédinger 
Hamiltonians when an interaction potential is present. A particularly relevant com- 
bination of interaction and free evolution forms the mathematical basis of scattering 


Mathematical Physics 


References 249 


theory. In Chap. 16, we shall return to physics to discuss scattering theory (and 
more) from the standpoint of Bohmian mechanics. 
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Hilbert Space 


The Schrédinger equation is a linear equation: linear superpositions of solutions 
are again solutions. We also need square integrability of the solutions. Thus one is 
naturally led to a vector space of square integrable functions for the space of wave 
functions. It turns out that this space has additional mathematical structure, namely 
the structure provided by an inner product. 


Definition 13.1. An inner product (scalar product) on a complex vector space .# is 
a positive definite sesquilinear form, i.e., a map 


(-|:): 30x Z/3C 
with the following properties. Let 9, yw € # and a € C, then: 


@) — (g|@) 2 Oand (g|g) =0 <> o=0, 

Gi) (lp +) = (919) + (ely), 

(iii) (play) = a(gly), 

(iv) (ely) = (ylg)*. 

Property (i) is called positive definiteness, and properties (ii)—(iv) define sesquilin- 
earity. 


Note that (111) and (iv) imply antilinearity in the first argument, i.e., 
(aly) = a" (ply) . 
As first examples, we have C” with the inner product 
n 
(z\w) = ¥ iwi, wwec, 
i=l 
and C({a,b]), the space of continuous complex-valued functions on the interval 
[a,b], with 
? ok 
(Fla) =f F*@)ala)ae. 
a 
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The role of the inner product is to introduce the notion of orthogonality of vectors. 
Two vectors @ and y in # are said to be orthogonal if (g|y) = 0. A sequence 
(Qi )icn € # is called an orthonormal sequence if its elements are pairwise orthog- 
onal and normalized, i.e., (@;|@;) = 6: 

Next we consider orthogonal decompositions. Let (@;)jcj be an arbitrary or- 
thonormal sequence. Then, for all @ € # and each N € N, the decomposition 


N 
= 2 G19) Gi +9 — x Pil) Gi , (13.1) 
a 

Wn Wi 


is orthogonal, meaning that Wy and Wy are orthogonal. Hence 


9 


llell? = (elo) = ¥ ilo) P+|e-> > (niole 
and (i) implies 


|o|? > yi (gi @) | (Bessel inequality) . (13.2) 


i=1 


For ~ = y/||w|| and N = 1, we obtain as a special case the important Schwarz 
inequality: 


elwi<lleliivl, Veawe#. (13.3) 


We can now easily show that || - || = \/(-|-) defines a norm,' which turns # into a 
normed linear space. It is positive definite and homogeneous by Definition 13.1 (i), 
(iii), and (iv), and the triangle inequality 


lp+wll<lell+iyi 


follows from the Schwarz inequality as an easy exercise. 

Such a normed linear space with inner product is very similar to R” (or C”). What 
is missing? Analysis in R is based on completeness. Having completeness means not 
having to worry about infinite sequences and limits. However, there is a price to pay. 
In the case of wave functions, the price of the comfort of having completeness — or 
better for considering the norm closure of the space of physical wave functions — is 
rather high. As we shall see, most elements in the completion of the vector space 
of wave functions are irrelevant and abstract. They cannot be considered as wave 
functions, i.e., as physical states. They are not differentiable, nor even continuous. 


' Recall that a norm || - || on a vector space V is a map || - || : V — [0, °°) such that (i) ||v|| =O v=0, 
(ii) ||ocv|| = |@|||v|| for all a € C, v € V, and (iii) ||v + w]] < ||v]] + ||w]] for all v,w € V. 
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13.1 The Hilbert Space L7 


A complete normed space is called a Hilbert space if its norm is given by an 
inner product, || - || = \/(-|-). Completeness means that every Cauchy sequence 
(@n)nen € # with respect to the norm || - || in # converges in .#. Obvious ex- 
amples are R” and C”. But we are interested in completeness of the space of wave 
functions with respect to the integral norm 


iwi?= [weyers 


Here the integral must be understood as a Lebesgue integral, otherwise the result- 
ing normed space of square integrable functions would not be complete, i.e., there 
would exist Cauchy sequences that did not converge to a Riemann square integrable 
function. But with the Lebesgue integral we obtain (as will be shown) a Hilbert 
space 


1?(R",d"x) = {9 : IR" — C measurable with ||@|| = [Pars z -| 


with inner product given by 
(oly) = f oye". 


Note that the Schwarz inequality (13.3) implies (g|y) <e for all 9, y € L’. 


Remark 13.1. About Equality in L? 

The notion of equality of elements in L’ is a rather special one. The relation @ = y 
in L? means that ||@ — y|| = 0, which in turn means that g = y almost surely,” ice., 
the equality p(x) = w(x) might be violated for a Lebesgue null set of x values. The 
elements of L” are thus actually equivalence classes of functions. Each equivalence 
class consists of functions which are equal almost everywhere. But this need not 
worry us most of the time. We just go on talking about “functions” f € L”, although 
they are actually equivalence classes of functions. Most of the time the distinction 
between functions and equivalence classes is irrelevant. It is only if one needs point- 
wise evaluations of functions that one has to be careful. In general f(x) makes no 
sense for an element of L”, but of course f | f(x)|?dx does. If an equivalence class 
contains a continuous or even differentiable representative (which is then unique), 
one naturally associates the class with this special function and pointwise evaluation 
becomes meaningful again. | 


The fact that L* is complete and thus a Hilbert space, and indeed the completeness 
of any L? space, 


? “Almost surely” is synonymous with “almost everywhere”. 
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L?(R",d"x) = {9 :R” — C measurable with ||@|| = (/inirars) < -| ; 


is the content of the Riesz—Fischer theorem: 


Theorem 13.1. The space L7(IR",d"x), and indeed every space L?(R",d"x) with 
1 < p<.~, is complete, i.e., every Cauchy sequence with respect to the norm in L? 
converges in the norm to an element of L?. 


Only when p = 2 is the norm given by an inner product, which makes this value 
special. The idea of the proof is simple. Convergence of a subsequence of a Cauchy 
sequence implies convergence of the whole sequence. Thus we can pick a subse- 
quence (x) cen = (Pn, )ken Of a given Cauchy sequence (@,)nen, such that 


[IP — Pesil| <2". (13.4) 
Then chose any sequence of representatives @;(x) and put 


m—1 


Vinx) = ¥ | x(x) — Pesi(x)| 


k=1 


and Y..(X) = limm—e. W(x), which may be infinite. [Note that (y,(x)) is 


monotonically increasing.] But because of (13.4), we have ||Y,|| <1, i-e., 


meN 


[lvniPatx < 1. 
Lebesgue’s theorem of monotone convergence implies that |y..|? is integrable, i.e., 
that y.. € L?. Hence |y..|* can be infinite only on a Lebesgue null set, and we have 
pointwise convergence of 
m—1 
Om = 91+ Y, (Pe+1 — Or) 
k=1 


almost everywhere to a function @. Since 
|Pml? <2 (1? +|Vml?) < 2(lgi]? + |Wol*) EL", (13.5) 


we can apply Lebesgue’s theorem of dominated convergence to conclude that |@|? € 
L!, and hence ~ € L?. The sequence hm := Qm — @ converges almost surely to zero. 
From (13.5), it follows that 


\iml? <2(lml* +101) < 2[2(lprl? + lye?) +192] ez", 


and a second application of dominated convergence gives |h|* —> 0 in L!, and 
finally 


hm Oink? => Q—-oinl’. 
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13.1.1 The Coordinate Space ¢? 


While completeness allows us to do analysis, the inner product structure allows 
us to transfer our geometric intuition from finite-dimensional spaces to infinite- 
dimensional ones. For us (and for almost everybody else), only so called separable 
Hilbert spaces are interesting, because they admit countable orthonormal bases. An 
orthonormal sequence (@,) ncn € # is an orthonormal basis if every 9 € # can be 
represented through an orthonormal decomposition with respect to (@;)nen € #@: 


0 = >Y (K\P)% - (13.6) 
keN 


Here the equality means that the series on the right-hand side converges in the norm 


of # to @. 


Remark 13.2. The Notion of Separability 

A topological space is called separable if it contains a countable dense subset. Hence 
4 is separable if there is a sequence (W,)ncn such that, for every element @ € # 
and € > 0, there is an index n € N with ||, — || < €. But this is clearly equivalent 
to the existence of a countable orthonormal basis: by inductively removing those 
elements from (YWn)nen which are finite linear combinations of the preceding ones, 
we can construct a linearly independent subsequence (Yn, )cen such that (Wn)nen 
can be recovered from finite linear combinations of elements in (Wp, )xen, i-e., such 
that span (Wh, ken = span (Wn)nen. The Gram—Schmidt orthonormalization proce- 
dure applied to (Wh, )xen finally yields an orthonormal basis (@)<en. On the other 
hand we can construct a countable dense set from an orthonormal basis (@x) xen by 
approximating the complex coefficients (@|@) by rational (complex) coefficients. 
(Recall that countable unions of countable sets are countable.) | 


It is in general not easy to check whether (13.6) holds for a given orthonormal 
sequence. However, using the fact that @ — }\(@|@)@, must be zero gives the fol- 
lowing more practical reformulation of (13.6). 


Remark 13.3. On Orthonormal Bases 
An orthonormal sequence (@,)xc is an orthonormal basis if and only if 


(|g) =0, foralkeN => =O. (13.7) 


To see that (13.7) implies (13.6), note that (13.2) implies, for arbitrary NV, 


N 
2 > |lpll? > py plo) I? . 


Hence (@y)nen with 


l| 
Mz 


IN (Pxl) Px 


>= 


1 
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is a Cauchy sequence which converges to some @. But for all k, we have 


N 
(6 — 919%) = lim (Ov — Pl) = jim be 11) Pi — |) 


= (941 9)" — (1x) = 
With (13.7), we may thus conclude that @ = 9. a 


From (13.6) it follows that, for an orthonormal basis, the Bessel inequality (13.2) 
turns into an equality, called the Pythagorean theorem, or in this context, the Parse- 
val equality: 


|||? a - |(@e|@) |? (Parseval equality) . (13.8) 
k= 


This motivates the introduction of a further important Hilbert space, which is the co- 
ordinate representation of any separable Hilbert space: ¢? is the space of all square- 
integrable sequences, viz., 


a {se >%n EC, by lanl? < -| , 


n=1 


with the inner product 


(xly) = > Ae 


It is not a very difficult exercise in analysis to show that this space is complete, and 
we skip the proof here. 

From (13.6) and (13.8), we conclude that any separable infinite-dimensional 
Hilbert space # is isomorphic to ?, #7 = @, ie., there exists a bijective linear 
isometry U : # —> (?. Such an operator U is said to be unitary, and is character- 
ized by the property that U is surjective and isometric, i.e., for all @ € #, 


IW elle =llellz - 


Note that isometries are always injective, and it follows from the polarization iden- 


tity, 


1 : : : 
(oly) = =| (Ile + vll?- le — v1?) -i(lo+ivil?-lle—ivll?)], 03.9) 
that any isometry, and hence any unitary operator, satisfies 


UeUW)p=(9lWxw, VOweH#H. (13.10) 


For any orthonormal basis (@,)xen of #, the coordinate map 
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U: 2030, PUG =((Ol))cen 


is such a unitary operator. It is surjective by definition and an isometry by Parseval 
(13.8). The Fourier series of a function in L?({0,27]) is a prominent example of such 
a coordinate representation. The elements of the orthonormal basis are 


| ike keZ, 


and the coefficients of the series are the coordinate sequence in (?. 

As we will try to understand in the following, L7(IR”,d’x) is also separable. This 
might be surprising at first sight, since the elements @(x), x € R”, could be naively 
interpreted as “uncountable vectors” @x (with x as index like the 7 in y; for y € R”, i= 
1,...,n). From this analogy one would expect uncountable bases for L?(IR",d"x). 
[Think also of (9.27), where we wrote w(x) = f |x) (x|@)dx.] Of course, this intuition 
would also apply to L?({0,27]), where we have already seen that it is wrong. Indeed, 
the arbitrary assignment of a value @, to every x would typically yield very irregular 
functions, and functions in L” are not that irregular. However, note that the space of 
essentially bounded functions L*([0,27]) is not separable, although L”((0,27]) C 
L’ ({0,2z]). So separability really depends on the norm, i.e., on how we measure the 
distance between functions. 

We now show that L?(IR,dx) is separable by explicitly constructing a countable 
orthonormal basis. Then separability of L?(IR",d’x) follows because of the tensor 
product structure, which we discuss later on. We will see that 


pan f (x22), 2x, } = L?(R,dx) , (13.11) 


2 
which is a natural guess, since the monomials x” are linear independent and e * [2 


makes them square integrable. The Gram—Schmidt orthogonalization procedure 
turns this linearly independent sequence into an orthonormal sequence 


H,(x) =P,(x)e* 2, nEN, 


where the H,, are called Hermite functions and the P, are polynomials of degree n, 
the Hermite polynomials. According to Remark 13.3, for (Hn)neNn, to be a basis as 
claimed, we must have 


VnE€No, (H,|9)=0 => =O. (13.12) 


But since the FP, are polynomials of degree n, (13.12) is equivalent to the requirement 
that the original system is a basis, i.e., that 


VnENo, (te |9)=0 => @=0, (13.13) 


and we now show (13.13). Assuming that 
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VnENo, (xe /? |) =0, 


then also 
VkER, (ee /2/) =0, 
since 
elktg-¥/2 s ike)” te ?/2 ; (13.14) 
m=0 m! 
in the L? sense. So the Fourier transform of e~*”/ > must be identically zero: 


1 ikx .—x? /2 —x"/2 
——= (ee =F (e*! )()=0. 
aA lp) @ ) (k) 
As we shall see in the next section, the Fourier transformation .¥ : LP >L isa 


unitary map, and hence ¢ = 0. So L?(R) is separable. 


Remark 13.4. Proof of Convergence 
Since it is not completely obvious that (13.14) really converges in L”, we give the 
argument in this remark. The sequence of partial sums 

3 (ik) mg—32/2 


! 
m=o M7: 


is a Cauchy sequence, because for each k € R we have 


2 2m oo 
k 2; ee. 
| = (mi)? i Me a dx 


jam " co x2 m 1 2 
Pe EG) eae 


fem oo V2 m 
<2" f eo /2dy = OE) on. 


m! 


(ik)”" xe /2 


m! 


So it converges, not only pointwise, but also in L’. | 


13.1.2 Fourier Transformation on L” 


On L?({0,27]), the Fourier transformation is just the orthogonal decomposition with 
respect to the orthonormal basis (e!*),<7: 
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F 1? ((0,2n]) — &, 
fr (fleez = (Ce, fez - 


But the plane waves e™ are not square integrable over R and thus, in particular, do 
not form a basis of L?(IR). The Fourier transformation is now a unitary map 


ikx 


F -?(R") + L?(R"), 
ff. 
with 
(g|f) =(8|f)  (Plancherel equality) , (13.15) 


where the Fourier transform of integrable functions is defined as usual by 


™ 1 n/2 ; 
F f = f(k) = (=) I. ek F(x) dx (13.16) 
and 
- ~ 1 \%/2 ee 
Ff =F "f= f(x) (sz) [ee Fare. (13.17) 


Integrable functions (f or f) are L!-functions. While on bounded domains, L? is 
contained in L!, this no longer holds on unbounded domains. Thus for some L?- 
functions, we cannot define the Fourier transform by (13.16). In order to define 
Fourier transformation on all of L7(IR”), one first analyses the behavior of “nice” 
functions under the Fourier transformation, as defined by the above integrals. For 
“not so nice” L?-functions, we define the transformation by approximating with 
nice functions, which must therefore be dense. 

It is convenient to begin by analyzing the Fourier transformation on “very nice” 
functions. The Schwartz space of rapidly decaying smooth functions .7(R"”) C 
L?(R") is the space of all C*-functions (functions for which all partial derivatives 
of any order exist and are continuous), which decay faster than any inverse polyno- 
mial as |x| — ce, and for which the same holds for all their partial derivatives. Since 
Cj (IR"), the space of smooth compactly supported functions is contained in .7(R") 
and dense in L7(R"), so .(IR”) is dense in L?(R"). 

Schwartz functions are certainly integrable, and .“(R”) is invariant under the 
Fourier transformation [1-3]: 


BiF HF | wee (13.18) 


This follows in a rather straightforward way from the definition. To get (13.18), we 


n~ 


must first show that arbitrary partial derivatives De f(k) of f(k) multiplied by any 


3 The reference [1] is highly recommended reading. 
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polynomial k* remain bounded. Here we use the multi-index notation 


oll 
@eEN@, k= k!...R, De: 


oa aye? Wea te tan. 


Referring to dominated convergence, we can differentiate the integral by differenti- 
ating the integrand to obtain 


” 1 n/2 : 
k“D? f(k) = (=z) / k& (—ix)P eik* f(x) d"'x 


- ey / ila [pe (| (—ix)8 f(x)a"x 


integrating |a|-times by parts to get the last equality. Hence 


d"x. 


sup|A"Dp Fk) < (4) [\Pss0s) 


From this one easily concludes that ¥,. #* :. Y — %. We still need to show that 
F* = F~', To get this, we consider compactly supported smooth functions, i-e., in 
Co, because we can continue such a function periodically and consider its Fourier 
series. The Fourier series of a smooth function converges uniformly to the function. 
Thus the idea is to approximate functions f from .Y by sequences (f;) from Cp. 
However, to conclude this density argument, we need continuity of -¥ as a map 
from .Y to .Y in a suitable sense. 

In order to define a suitable notion of convergence of sequences in .”%, we con- 
tinue the above estimate and introduce the following family of seminorms (“‘semi” 
means that || f|| = 0 does not necessarily imply f = 0): 


flap = oP | ype f (y)| ; for arbitrary multi-indices a, B . 


Then the previous estimate yields (one should remember the following trick for later 
use) 
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d’x 


oe 1 n/2 
IF flap =llap < (52) _f [PE?rC9) 
2 ‘| n/2 1432" 
~ Wn 1+x22 
1 ne 2n\ pa (B 1 n 
< (==) sup| (1+ ) DE (x £)| | one" 
< C(a,B) x IIflly,6 - (13.19) 


I1S|B|+2n,|6|<|or| 


d"x 


DE (x? f(x) 


The same holds for .¥*. (This family of seminorms defines a metrict on .Y. Con- 
vergence of sequences in .¥ with respect to this metric is equivalent to convergence 
with respect to || - ||, for all @ and B. For details see, for example, [2, 3]. How- 
ever, this is not really important for the following.) For us it is important that we can 
approximate f € .Y by fo € Cp in such a way that Y(f — fo) is also small. More 
precisely, we want to show that 


FFf=f, oF*=F', oF. 
We will show that 
Ffo=fo, foe. 
Then a simple triangulation together with (13.19) (for #* and F) yields 
|F*F-Flloo S |F*F-F*folloo || F*fo folloo If folloo 


SC > |lF-fllyo- 


\y|<2n 


Hence we need to approximate f by fo with respect to the seminorms || - ||0, |7| < 
2n. To do this we first define a Cy-function ® that is a “smooth version” of the 
characteristic function of the unit ball: @(x) = 1 for |x| < 1, ®(x) = 0 for |x| > 2, 
and ® is smooth and monotonic for | < |x| < 2. In dimension n = 1, this looks like 
a snake that swallowed an elephant. It is easy to construct such a function ® using, 


e.g., 
1 
exp(—7—arl)], Q<y<l, 
ly 


which smoothly interpolates the values 1 for y < 0 and 0 for y > 1. Then the func- 
tions ®,,(x) := ®(x/m) are | on the ball of radius m and have derivatives of order 
1/m. Hence the sequence f®,, in CP approximates f with respect to all the semi- 
norms || - ||¢,g- 


4 We will refer to this metric later on when we discuss the dual space of .7. 
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Now we go on with fo € Cj. Let L be such that suppfo C [—L,L]”, the cen- 
tered cube in R” with edges of length 2L. We can consider fo as a smooth periodic 
function over the cube decomposition of IR”. With the orthonormal basis 


1 n/2 
gy = (=) exp (i=1-x) ' leZ’, 


the Fourier coefficients are 


1 n/2 
a= (dil fo) = (=) - exp (-iF1-x) fo(x)d"x.. (13.20) 


From standard analysis, we know that the Fourier series of fo converges uniformly 
to fo, and we now apply this. To relate to the Fourier transform of fp, we note that 


(13.20) implies 
~ OT L n/2 
to (71) = (=) Cl, 


which suggests taking the limit L — of the L-Fourier series of fo. The latter con- 
verges uniformly and is given by 


fo(x) = Lats) cl 
n/2 nf2_ 
-¥ (a) soli) (Ya) 
n/2 n _ 
= (4) (6) Zon Gft)A(8). 


But the right-hand side is just a Riemann sum for the integral PF oe and it converges 
to exactly this integral for L — o, since fo € %. Hence fo(x) = ¥* fo, and we 
can conclude that ¥* = ¥~', ie., (13.18).° From this we easily get (13.15) for 
Sf SS 


> Note that our computation was just a mathematical way of expressing the fact that 


1 ikx _ x 
sale dk = 8(x). 


We will comment on this formula later on. 
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o~ 


= (élf) 


where interchanging the order of integration is unproblematic because the integrals 
converge absolutely. In addition, this immediately implies 


fll =lWAllae » (13.21) 


for f € .Y, and thus the continuity of the map ¥ :.Y% — .Y with respect to the 
L?-norm. 

Another useful property of the Fourier transformation is that the Fourier trans- 
form of a product of two functions is given by the convolution of the Fourier trans- 
forms of the two functions. 


Remark 13.5. Convolution 
For f,g€ 


Fire) = (2) [Ae-wyawyare =)" (Fa) 00, 


and 


PASTE) (Ze) fe® [ree—netoarvar 


n/2 
7 &. / / e GY) F(z) g(y) d"ed"y 


= (2m)"/? F(k) 8(k) . 


Note that the convolution integral f * g is also well defined for f,g¢ € L’: 
(f *g)(x) = (FC: —x),8) 12 ’ 
where f(y) := F(-y). a 


In the next step, we use a density argument again, in order to extend the Fourier 
transformation to a unitary map on L?. One way of doing this is to say that, with 
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(13.21), the map ¥ defines a continuous (with respect to the L?-norm) linear opera- 
tor on the dense subspace .Y C L’, and thus uniquely extends to a continuous linear 
operator on the whole space L”. But since we have not introduced these notions yet, 
we shall work out the argument in more detail. 

Let f € L’ and let f;, be a sequence in .Y such that f, converges to f in the L?- 
norm. Such a sequence always exists, since .Y is dense in L?. Then f, is a Cauchy 
sequence in L?, and with (13.15) f, is also a Cauchy sequence in L*. By com- 
pleteness it therefore has a limit @ € L’. This limit is independent of the choice of 
the original sequence f,,. For let f, be another Cauchy sequence converging to f. 
Then (f1, f],/2,f5,-.-) converges to f as well, and so is Cauchy. But then by the 
same reasoning as before, (fi Ft ; fosths ...) is Cauchy and converges to @, since the 
subsequence f;, converges to @. So every other subsequence, and in particular f/, 
converges to @. 

Consequently, we can define the Fourier transform of f € L” to be given by the 
unique limit @, 


n~ 


f=Fff:=0. 


It remains to show that (13.15) (which we showed for f,g € .Y) extends to f,g € L’. 
But this follows immediately from the norm continuity of the inner product (-,-). Let 
(fn); (Sn) C Y converge to f,g € L* in norm. Then (fy, gn) also converges to (f,g). 
This follows from the Schwarz inequality (13.3): 


| (fas 8n) — (f,8)| < | (fn -F.8)| + | fn8 —8n)| 
< Ifa — FU Noll + fall ign — ll - 


Hence .¥ is a unitary operator on L’. 
If one wants to compute .¥ f in the case where f is indeed only an L?-function, 
then one can approximate f for example by the L' L?-function f X-1> and ap- 


proximate f as in 


t ! ‘i 243 —ik-x n 
P= (se) ati fran lsde 70) x. 
Remark 13.6. Distributions 

We extended the Fourier transformation from .Y to L” by using the L” continuity of 
F:S — S., But as we have seen, ¥ :.Y — -Y is continuous in a much stronger 
sense, namely with respect to the family of seminorms || - ||¢,g. This allows one to 
extend ¥ to a much larger class of “functions”, called tempered distributions. A 
tempered distribution @ € .”’ is a continuous linear functional on -Y, i.e., a linear 
map @ : Y — C such that || fn — f\|.,8 — 0 for all a, B € No implies P( fn) — O(f) 
in C. The space of linear continuous functionals on a topological vector space 7 is 
called its dual space, denoted by 7’. So the space of tempered distributions .”” is 
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just the dual space of .” with respect to the topology on .Y induced by the family 
of seminorms || - llo,p-° 

For example, every function in .Y itself defines a tempered distribution as fol- 
lows. For 9 € - let 


To(f)= [ow f@'x=(9'lfe, VfEX. (13.22) 


Motivated by the “similarity” with L, one often extends the inner product notation 
to the natural pairing of linear functionals with vectors, e.g., in the present case, for 
T € # and f €.Y, one writes 


T(f) = (Tf). - (13.23) 


For any continuous linear map A: .7 — .%, one can now define the adjoint map 
A*: .S' — S' through 


(A*@)(f) = @(AP) - 


This simple trick allows one to extend the Fourier transform (and, of course, also its 
inverse) from .Y to .Y’. Let p € %’. Then Fo € ./ is defined by 


FOL) =OFf), VIEFS, (13.24) 
or symbolically 
ON =9f), VIES. 


This is indeed an extension of the Fourier transformation on .Y in the sense of 
(13.22). For 9, f € 7%, we have 


To(f) “= TA) "=" [oe fexiarx 


lI II 
Se S 
> 
“x > 
pa) 
oe ll; 
Q 
So aes 
& 6) 
_* 
ies 


However, the identification of a function @ with a distribution Ty as in (13.22) makes 
sense for a much larger class of functions than .”. A sufficient condition would be, 
for example, that @ is measurable and polynomially bounded or, alternatively, that 
@ € L’. In both cases, Tg defines a tempered distribution. An interesting example of 


© One can carry out a similar construction replacing .Y by J := Cy. The dual space Y of F 
is called the space of distributions. For the purposes of Fourier transformation, Y% and .’ are 
advantageous, because, as we have already seen, .”, and as a consequence also .”’, is invariant 
under F. 
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this kind is the constant function 


1.e., the distribution 


To(f) = (4) [feoay. 


Its Fourier transform is given by 


in=TA=(L)" [Fwee=s [= [acoso 


so that, heuristically (hence the inverted commas), 


i? ; 
(=) [atke = 5(x) 0 
This is the heuristic formula ¥ 1 = 6/ (2m)"/ > or more simply, 
/ d"ke“ik* — (27)"6(x). 


It provides a powerful heuristic computational tool. For example, one obtains the 
inversion formula simply through 


(#*7) ® = ee [ot*faats 
2 (=) ; [fete pelyarkars 
re [ne re, an 
= [ex f(x) 5(x-x) 


= f(x). 


In the same way as for the Fourier transformation, one can extend other continuous 
linear mappings from -¥ to .Y to mappings from .”’ to .”’. An important example 
are the partial derivatives 0,, : ./(IR") + (R"), where one defines the distribu- 
tional derivative for gp € .%’ by 


(59) (f):=9 (-x-) , Wher. 
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Here the minus sign on the right-hand side ensures that the distributional derivative 
does indeed extend the usual derivative, in the sense that, for @ € ., one has 


(To) N=Taof), WEF. 


In this way one can, for example, understand the Laplace operator A, appearing in 
the Schrédinger equation as an operator acting on distributions, and thus in particu- 
lar also as an operator acting on L?-functions. One can read Schrédinger’s equation 


as an equality for distributions, and would say that y(t) solves the Schrédinger 
equation in the distributional sense. What we really mean then is that, for every test 
function f € .Y, the time-dependent distribution y(t) € ’ satisfies [and here the 
alternative notation (13.23) is convenient] 


iS (WNL) gry - (-Fa.wonlr) LS - (vo 7 if) Sp 


which is now an equality in C. 
Another example is convolution, where for y € .Y the map 


wx: S > Sf, 
Dir — LED 
can be extended to a map 
ye: S’ > S', 
Pr > Wre@. 
| 


Remark 13.7. Uniqueness of the Extensions 

Both extensions of the Fourier transform, the one to L? and the one to .” , are 
unique. This is because in both cases the extensions are continuous maps which are 
uniquely defined by their values on the dense set .”. [Although we did not show 
that .Y with the identification (13.22) is dense in .Y’, this is in fact true.] Now we 
saw that L?-functions can also be identified with distributions by the identification 
(13.22), ie., Lc S#'. It is also true, and easily seen, that the restriction of ¥ : 
S! — S' to L? C.F! agrees with the Fourier transform which was directly defined 
on L previously. | 


Remark 13.8. On the Decay of L?-Functions 

One might naively expect (square) integrable functions to decay at co. The example 
e7¥ (sinx)? shows that this is not true in general. However, for L? (IR), the following 
shows what happens. If, not only @, but also g’ is square integrable, i.e., 9, 0' € L’, 
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then g(x) — 0 for x —> +c. Here g’ € .”' is first defined as the distributional 
derivative of @ € L’, but we say g’ € L’ if the distribution @’ is indeed given as the 
distribution associated with the L?-function g’. It turns out that, for 9,’ € L?(R), 
one has the usual theorem of calculus 


9 (x) = [ e'oyayte, 


where @¢ is an absolutely continuous function of x [4]. Hence we can do integration 
by parts and find that 


< | [gear [ “Io'(s)Pa Y  tpy 13.3) 


for a,b —+ ~, since both integrands are integrable. Hence ~7(x) converges, since 
every subsequence @(x,) is Cauchy and therefore convergent. But the limit must be 
zero, otherwise |@|* could not be integrable. Oo 


13.2 Bilinear Forms and Bounded Linear Operators 


We now return from L? to the abstract Hilbert space setting. Our aim is to under- 
stand that the symmetric bilinear (more precisely sesquilinear) forms are in one- 
to-one correspondence with the bounded symmetric operators. This is completely 
analogous to the same statement in linear algebra and rooted in the self-duality of 
Hilbert spaces. 

A bounded linear functional ¢ on .# is a linear map £: # — C, for which there 
is a constant c < © such that 


M@l<cloll, VWecxH. 
Clearly, every y € #@ defines a bounded linear functional via the inner product, i.e., 
vy: 4 —C, 
pr (wig). 


But are these ail the bounded linear functionals on .#? The answer is affirmative: 


Theorem 13.2. Let ¢ be a bounded linear functional on F. Then there is a unique 
vector % € such that €(@) = (%\@) for all @ € Z. 


This means that, to each linear functional @, there corresponds a unique vector onto 
which £ projects. As mentioned before, the space of bounded linear functionals on 
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HE is called the dual space of # and is denoted by #*. The unique identification 
of £€ #* with w € # claimed in the theorem yields #* = #, i.e., H is dual to 
itself. Before we come to the simple but not completely trivial proof of the theorem, 
we need to understand some general geometric properties of Hilbert spaces. 

Let WC # be some arbitrary subset of #. Then one defines the orthogonal 
complement of ./@ in # as 


ti={we #: (9,y)=0, forallpe.a}. (13.25) 


The linearity and continuity of the inner product imply immediately that .#/+ is a 
closed subspace of .#. The following theorem also holds for non-separable Hilbert 
spaces, but we prove it only for separable ones. 


Theorem 13.3. Let. @ C # be a closed subspace of a separable Hilbert space #. 
Then one has # =. & UM, i.e., each vector 9 € # can be uniquely decomposed 
asp=ywt+y", with ye and yt € M-. 


To prove this, we first note that, as closed subspaces of a separable Hilbert space, @ 
and .@~+ are by themselves separable Hilbert spaces, and as such allow for orthonor- 
mal bases (@n)nen and (62)men, respectively. We will show that the orthonormal 
sequence (n)ncN U(@e)men is an orthonormal basis for “, using the criterion 
(13.7). Assume that 


(On, ) =0 = (bn) 5 (13.26) 


for all n,m € N. The first equality implies that (¢,@) = 0, for all @ € 4, and 
therefore p € .@~+. Since (@;-) men is by assumption an orthonormal basis of .7+, 
the second equality in (13.26) implies that @ = 0. Hence by (13.7), we see that 
(dn)new U (0) men is an orthonormal basis of #. Moreover, for any p € #, we 
have p= y+we-, with 


y= > (Ons P )On € Me , = 24 (bin ® @ ) On € Me 


n=1 


Uniqueness of the decomposition is a very easy exercise. 

We now come to the proof of Theorem 13.2. Pick any bounded linear functional 
£on #. We are looking for a corresponding vector y € # on which £ projects. In 
particular, all vectors orthogonal to y are mapped to zero by @. Hence we look at 
the null space of £, viz., 


M ={p eH \ (9) =0}. 


Note that .@ is a closed subspace of #, since @ is bounded and therefore con- 
tinuous. If W = #, we could pick y = 0 and we would be done. Otherwise, by 
Theorem 13.3, the orthogonal complement ./+ is at least one-dimensional. Indeed, 
MM * is exactly one-dimensional, which is basically the statement of the theorem. To 
see this, let yo, wi € + \ {0}. Then 
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£(Wo — ay) = L(wo) —al(yi) =0, 
for & = (wo) /£(w), and thus 


vo ey € MNM~ ={0}. 


Hence .4+ is one-dimensional, and with a normalized Wea + we can use The- 
orem 13.3 (and its proof) to uniquely decompose any @ € .# as 


P=VW+w =W+ (Wl) Yo, 


with y € -@. But then we find the desired result 


£(p) = w+ w~) = L(y) + £((wole) Yo) = (Wo) (Wol@) = (IQ) , 


with Y% = £(wWo)* Wo, where we used antilinearity in the first argument of the inner 
product. 

Now we come to the equivalence of bounded linear operators and bounded bilin- 
ear, or more precisely sesquilinear, forms. A linear operator A : KH — # is bounded 
if there is a constant C < oo such that 


|Ag||<Cllgl|,  —forallpe #. 


The norm of A is the smallest such C ; 
. A ‘ 
||A|| := sup ——— = sup ||A@|| , (13.27) 
9 


and it is quite easy to see that this definition turns the space of bounded linear op- 
erators on # into a complete normed space. Note that a bounded operator is obvi- 
ously continuous, but the converse is true as well: every continuous linear operator 
is bounded. 


Theorem 13.4. Let B(- , -) be a map from # x # to C with the following prop- 
erties. For all 9, W,x € H and a,B € C one has 


(i) B(g,aw+ Bx) = aB(,w) +BB(@,x) , 
(ii) B(p, w) = Bly, @)*, 
(iii) |B(@, y)| < Clle||||w|| - 


Then there exists a unique symmetric bounded linear operator A on #€ such that 


B(g,w) = (Ag|y), —forallo,we #. 


This is an immediate consequence of Theorem 13.2, since B(@, - ) is obviously a 
bounded linear functional. Hence the action of B(@, - ) is given by projecting onto 
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a unique vector @, viz., 


B(@,W) = (|W) . 


The mapping @ —> @ defines an operator A through Ag = @ with the property 
that (Ag|w) = B(@, y). But the properties (i)-(iii) of B imply immediately that A is 
linear and bounded. Linearity is obvious and boundedness is 


\|Ag|* = (A@|Ag) = B(@,A@) < Cllell|AQ|| , 
whence 
|Ag|| <Clle\| . 


Finally (ii) implies symmetry of A, which means that for all 9, y € #, 


(Ag|w) = B(9, vy) = B(y,9)* = (Aw|g)* = (g|Ay) . (13.28) 


13.3 Tensor Product Spaces 


Now we come to the question of how to describe entanglement of wave functions 
for N particles. Let us proceed in a purely axiomatic way for the moment. For one 
particle, the state space is the Hilbert space of what we shall call wave functions 
L (IR? , d°x). For two particles’ we have several possibilities, if we proceed axiomat- 
ically, without taking the physical theory into account. For example, one could take 
the direct sum of the one-particle spaces # and .#%4: 


HoH=1(9) me rtime x}, (13.29) 


with the inner product 


) | (my) = (11 Yi). 4 + (92/2). - 


Physically, this amounts to de Broglie’s conception of having one wave per particle 
(the dimensions of the spaces add up). However, we know already that the wave 
function lives on configuration space, i.e., it lies in L?7(IR°,d°x). And this is not the 
direct sum of the one particle spaces L?(IR*,d°x), but the tensor product, i.e., the 
space of linear superpositions of products @; (x1 )@2(x2). 

Formally, one can form the tensor product of two given spaces .#% and #4, by 
just taking linear combinations of “formal products” @) ® @2, ¢; € H. One then de- 
fines an inner product on that vector space of sums of formal products in a natural 


7 We leave it to the reader to formulate the results of this section for N particles. To simplify the 
notation, we exemplify with 2 particles only. 
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way, in order to turn it, after completion, into a Hilbert space once again. However, 
this is so formal that we should hold on for a second and ask ourselves why we 
should be interested in getting the many-particle spaces in such an abstract manner. 
The reason is that in many models one would like to couple different systems with 
different kinds of degrees of freedom. With the notion of tensor product, one obtains 
the full Hilbert space of such coupled systems in a straightforward and unambigu- 
ous way, by just taking the tensor product of the single-particle Hilbert spaces. For 
example one can now easily combine spatial degrees of freedom with spin degrees 
of freedom. 

We will discuss this later in more detail, but the moral is that we are not so much 
concerned about the many-particle L7-space, which is given by physics anyway. In- 
stead we want a convenient way to include new degrees of freedom into our models. 
Still we do not wish to proceed too abstractly, and choose a slightly more concrete 
road to introduce the tensor product. For @; € #4 and @2 € #4, we define the bi- 
linear (for N factors an N-linear) map 


—1 2: Hx A4—-C, 


P1 @ P2(Wi, W2) = (P1|Wi) 24 (P21 Y2).% - (13.30) 
Then we take all finite linear combinations with coefficient in C, i.e., 
span©(@) := span© (91 @ @2,91 € 4,02 € %) , 
and on that space define the inner product as the linear extension of 


(P1 @ P2|YV @ Wr) @ = (P1|Y1) 74 (P21 V2) % - (13.31) 


What is still missing for span© () to be a Hilbert space? Completeness, i.e., closed- 
ness under Cauchy convergence! We therefore define #% = “ ® # as the com- 
pletion of span(®) under the norm ||- ||= :-= /(-|-)e- 


Remark 13.9. About Completion 

We should say a few words about the general idea of completion. Completing a 
unitary space .@ means finding a complete unitary space #, i.e., a Hilbert space, 
in which -@ can be isometrically and densely embedded. The canonical way to 
construct .# is to consider equivalence classes of Cauchy sequences. Two Cauchy 
sequences fy, 2n € @ are equivalent, if lim,—.. || fn — gn|| = 0. This obviously de- 
fines an equivalence relation and .# can be defined as the space of equivalence 
classes of Cauchy sequences. One can now define an inner product on # through 
lim, —00(fn|gn), which is independent of the chosen representatives, due to conti- 
nuity of the inner product. Finally, one can prove completeness of # along the 
following lines. From a given Cauchy sequence F,, € # (a sequence of sequences 
in @) extract the diagonal sequence f = (Fi.n)n, which is again a Cauchy sequence 
in .@ (use the triangle inequality here), ie., f € #. Finally convince yourself that 
f is the limit of the Cauchy sequence F,,. One can now recover .@ in # by identi- 
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fying with f € .@ the constant Cauchy sequence f,, = f in #. By construction the 
constant sequences are dense in .#, and isometry is evident. | 


However, we were a bit hasty when we talked about extending (13.31) by linearity to 
span(®) (resp., #), since it is not a priori clear whether this procedure is unique. 
More precisely, if for ©, ¥ € span(®) we define (®|'¥)~ through linear extension 
of (13.31), the result must be independent of the representation of ® and Y. In other 
words, if ® is the zero form, then (|). = 0 must hold for all ¥ € span(@). To 
see this let 


N 


Y= > GOK@N- 
k=1 


Then, because of linearity, 


(0 


since @ is the zero form. 
Moreover, the inner product must be positive, i.e., (Y|%)@ > 0 for ¥ 4 0. To 
check this, let 


N N N 
Yom om) = 20 (P|o% @ Nk)e = 2 PO, Nk) = 
k=1 Q 


N 
Y= > UB - 
k=l 


By decomposing the vectors (@x)x=1,....v and (Nx)x=1,...,v With respect to orthog- 
onal bases (@,) and (f},) of the corresponding subspaces span(@) C #4 and 
span(1,) C #4, we can write 


N 
P= ¥ aud ®t, 
kl=1 
whence we find that 


(PP) = -(3 YS ob ® th 
ii=t 


y Onm Gn ® nn) 
® 


njm=1 


N 
= YY fy Onm (GB Fi| Gn ® fim) 
k,l,m,n=1 


= > Oj Onm (Pk| Pn) AULD EA 
kl,mn=1 


N N N , 
= ¥ Cinthia = Yo ag=->, log SO: 


kLmn=1 k=l kl=1 
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Remark 13.10. Warning: Product Functions Are Not Typical 
Generically, ¥ € “% is not of product form '¥ = @ © @z, but of the form 


Y= > aj @Q;. 
i,j=l 


Note also that the dimension of the tensor product space is the product of the dimen- 
sions of the single spaces. This is a clear statement for finite-dimensional spaces, and 
it also holds true for infinite-dimensional spaces in the following sense. 


Theorem 13.5. If (@) and (w;) are orthonormal bases of 4, and #4, then the set 
(@x @ Wi is an orthonormal basis of FA ® 7. 


Clearly, the (@ © Y)x; are an orthonormal system. One way to see that they do 
indeed form a basis is to show that the closure S of span ((@x @ Wi) i) contains 
span©(®) and therefore also its closure #% @ #4. But this follows if we just show 
that p®y €S, for all p € H and w € #H. Now let 


P= ¥ He , v=> Bbw, 
k=l i=l 


and define 

N N 

P(N) = Yonge , y(N) = ¥ Bw - 

k=1 [=1 
Then 9(N) @ y(N) = a 1-1 %{ Bi Pr ® Wi, and we see that the difference 

lo 2v—9(W) @ wN)| 
goes to zero as N — oo: 

eave) 2vIN)| = |[e-9@™] ev-oMe [vr -v]| 


S |le— eI Yl + le@)IwA) — yl 
< |le— eA) +I ¥&)— yl — 0. 


After these abstract considerations, let us look at a concrete example. The tensor 
product L?(R,dx) @ L?(R,dy) is naturally isomorphic to L7(IR?,dxdy). If we iden- 
tify an element Y of L?(IR, dx) ® L7(R, dy) with a linear combination of products of 
functions, 


P= PS ag @ Wi = ¥ aa e(x) wily) = (x,y) , 
kil kl 
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then the function (x,y) is indeed an element of L7(R?,dxdy), since 


[fivcnParar= ¥ oie f o:(s)on(s)ar] | [ veorwatorer 


kl,m,n 


= > lea? ees 
kl 


Here and in the following, (@,)x and (y); are orthonormal bases of L?(IR,dx). To 
see that this map from L?(IR,dx) @ L?(R,dy) to L?(R?,dxdy) is indeed an iso- 
morphism, we show that products (@%(x)y(y)),, form an orthonormal basis of 


L? (IR, dxdy). We see this by once again using our criterion (13.6) for orthonormal 
bases. Let ¥ € L?(IR?,dxdy) be such that 


[[¥eevidyarey=0, Vit. 


We show that the only vector orthogonal to all (g(x) yi(y)),, is ¥ = 0. This follows 
from Fubini’s theorem: 


[| [7 erreoas wily)dy=0, VI, 


implies that the function 


sly) = [ ¥* Ce») gela)de € 17(R, dy) 


vanishes outside a null set N;. Hence, for y ¢ UN;, we have 


[Pr x(0)be= 0, Vk. 


But then ¥* (x,y) vanishes almost everywhere (with respect to “dx”) and ¥(x,y) is 
zero almost everywhere with respect to “dx dy”. Thus (¢(x) yi(y)) y, 18 an orthonor- 
mal basis of L?(R?,dxdy). 

As a consequence, the mapping 


U Qe 2 Wy — G(x) Wily) 


maps the orthogonal basis L?(IR,dx) @ L?(IR,dy) onto an orthogonal basis of the 
space L(IR*,dxdy), and we can extend it to a unitary operator 


U : L?(R,dx) @L’(R, dy) — L?(R?,dxdy) 
by linearity. 


In this sense L?(IR,dx) @ L?(R,dy) is canonically isomorphic to L7(R?,dxdy), 
and in general 
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3N 
60) 17 (R, dx;) & 17(R°%, x) . (13.32) 
i=l 


In Bohmian mechanics product functions @(x) y(y) (or @ ® y) in the tensor space 
describe statistical and metaphysical independence of the x- and y-systems. In 
general, however, the wave function evolves, through interaction potentials in the 
Schrédinger equation, into a typical element of the tensor space that cannot be writ- 
ten as a product, but is of the form 


Y HMB Wi = Y O41 V(x) Wily) = (x,y) . 
kl kl 


Remark 13.11. On Spinor Wave Functions 
For the space L7(R?,d°x; C) of spinor-valued wave functions 


Cm 
Q2 ) ’ 
we also find a natural isomorphism with the tensor space L7(R?, d°x) @ C’, viz., 
L?(R?,d°x;C?) = L?(R? ,d°x) @C’, 
through the identification 
U:P@v—> g(x)y, 


for @ € L?(R3,d°x) and v € C? and its linear extension. One can replace C” by an 
arbitrary Hilbert space. In particular, for the wave function space of N particles with 
spin, 


In many applications, one reduces spin-related problems to involve only the “spin 
degrees of freedom”. This is possible if the full wave function ¥ € L7(R*") @ C2" 
has a product form ¥ = y@@ with y € L?(R°") and @ € C2", and if this product 
structure is (approximately) conserved under the time evolution, i.e., if there is no 
coupling between translational motion and spin dynamics. However, even if the 
decoupling condition is not satisfied, e.g., when discussing the EPR experiment, 
one often only explicitly considers the spin factor @ € C2” and its dynamics. In 
these cases it is very important to bear the full picture in mind. | 


Remark 13.12. On the Schmidt Basis in Tensor Spaces 
A given function yw(x,y) can always be represented in a bi-orthogonal basis with 
non-negative coefficients, 1.e., 
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Y) = Yon Gn(x) Hay) , (13.33) 


where (@,) and (W%,) are orthonormal bases and a, > 0. This is emphasized, for 

example, in the context of the wave function of the universe, because (13.33) has 

a formal similarity with the superposition emerging from a measurement process. 

Mathematically, this observation goes back to work by Schmidt [5]. Although the 

physical implications of (13.33) are overrated, it is often convenient to have the 

representation (13.33) for other reasons, so we shall explain briefly how to get it. 
In general, we showed that, for orthonormal bases (@,,) and (Y;,), one has 


= > Onin x) Waly) 


nvm 
For simplicity, we consider only finite sums and do some linear algebra with 
ay Onm Pn (X )Uinly y) . 
nym=1 


Under unitary transformations S and T, i.e., S*S = T*T =|, the orthonormal bases 
(@n) and (Y;,) are mapped to orthonormal bases: 


S* Qn = Dn ’ T* Wn = Wn : 
With A = (Gam), we write 
N 
y(x,y) _ oy (SAT ) nm@n(X) Yn (y) ) 


nym=1 


and hence (13.33) follows if we can show that there exist S and T that diagonalize 
A, 1.e., 


SAT =D, (13.34) 


for some diagonal matrix D. To see that such matrices exist, let (e,) denote the 
canonical basis of RY. With Te; =: ty, and Ss, := ex, we can write (13.34) as 


SAT e; = SAty = SO, 8, = Opex , 
whenever 
Aty = OS; - (13.35) 


Then D = (6,). So the question now is whether there are bases s; and t, such that 
(13.35) holds, and the answer is affirmative: they do indeed exist. With the adjoint 
matrix A*, we have that A*A and AA”® are positive self-adjoint matrices and hence 
diagonalizable with positive eigenvalues ap and 2, and an orthonormal eigenbasis 
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t, and s,,. But 
A* At, = off ty 
implies 
AA*At, = aZAty , 


i.e., At, is an eigenvector of AA*, whose eigenvalues are 2 with eigenvectors s,). 
After renumbering where necessary, we have a = a? and At; = dys, with 


5¢ = (OeSk, 58k) = (Ate, Aty) = (A*Aty, te) = Of (th, te) = O , 


i.e., Ox = A. | 


From now on, we shall say that the wave function is an element of a Hilbert space 
H€, and we shall usually mean 


Hl = 1? (BSN BN x) . 


We know that only the nice smooth functions in that space, the real wave functions, 
are physically relevant. 
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Chapter 14 
The Schrédinger Operator 


In Bohmian mechanics, the dynamics of the wave function is determined by the 
Schr6dinger equation (8.4) and the dynamics of the particle positions is determined 
by the guiding equation (8.3). In Remarks 7.1 and 8.2, we noted that in Bohmian 
mechanics the wave function must be differentiable, i.e., Bohmian mechanics is 
based on classical solutions of Schrédinger’s equation. We will not be concerned 
with classical solutions in the present chapter, which develops the point of view 
already initiated in Chap. 12. We discuss here a new notion of solution in the sense 
that the Schrédinger equation gives rise to a unitary time evolution on Hilbert space. 


14.1 Unitary Groups and Their Generators 


As in every physical theory, the mathematical equations should be specified in such a 
way that, for suitable initial data, the solutions are uniquely determined for all times. 
In this chapter we discuss in particular the mathematical problem of setting up the 
Schr6dinger equation in such a way that the solutions are uniquely determined for 
all times by the initial wave function at some arbitrary initial time. Moreover, the 
solution y(t) of the Schrédinger equation also enters the guiding equation for the 
particles, for which we also expect the existence of solutions at all times. But if 
(almost) all trajectories exist for all times, then equivariance of the | y|*-distribution 
leads to a further minimal requirement on y(t), namely the conservation of the total 
probability f |y(t)|?d”x = 1, i.e., conservation of the L?-norm. 

However, for singular potentials, like the physically relevant Coulomb potential, 
or for configuration spaces with boundary, the mere fact of specifying the potential 
does not lead to either uniqueness or, in general, conservation of norm for the so- 
lution of Schrédinger’s equation. Indeed, in these cases, for any given initial data 
Wo, there exist many solutions of Schrédinger’s equation, some with growing or de- 
creasing norm. As we understood already in Chap. 12, one needs additional bound- 
ary conditions in order to make the physical situation described by the equations 
unique, and thus to select the correct physical solution. As we shall explain, using 
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the concept of self-adjointness one can at the same time enforce uniqueness of the 
solution and conservation of norm. Then one is left to select the physically correct 
self-adjoint version of the equation in order to get unique solutions. 

We treat this problem as a mathematical problem, whence physical constants and 
dimensions will be irrelevant, and we put i = m= | so that the Schrodinger equation 
becomes 

vert) 


a —sAcy(x,t) +V(x)y(x,t) =: H(A,V) y(x,2) , (14.1) 


with y(t = 0) = Wo € L’ as initial condition.! Thinking of y(t,x) as a vector-valued 
function y : R — L?, t+ y(t), we can also understand (14.1) as an ordinary linear 
differential equation. 


d 
iv) =A). 
Then at least formally we obtain y(t) = e7!7 
“real”, we have, with the i in the exponent, that U(t) = 
operator on the Hilbert space .#. Indeed, formally we even expect U(t) =e 
to be unitary, so that || y(t)|| = ||U(t)w(0)|| = || w(0)||. Our formal expectations 
motivate the following definition: 


Wo as the solution and, since H is 


e “4 is a bounded linear 
—itH 


Definition 14.1. A (strongly continuous) unitary one-parameter group 
U(t): 7 9 # 


is a family of linear operators (one operator for each t € R) on a Hilbert space #, 
such that: 


(i)  t++U(t)wis continuous for each yc # , 
Gi) U(t+s) =U(HU(s), U0) =I, 
Git) = ||U(t) w|| = ||y|| for allt €C Randall ye # . 


Condition (i) is called strong continuity, and it means that ||U(t) yy — U(s) y|| — 0 
for t > s. From (ii), it follows that U(r) is invertible with inverse U(—r). Hence, 
together with (iii), it follows that U(t) is indeed unitary [see also the paragraph 
above (13.10)]. Note also that, for unitary groups, strong continuity is equivalent to 
weak continuity, since 


2 
fim (0) —l] yl] =2Il yl? —21imRe(ylu()y) = 0. 


t0 


This definition captures all our requirements for the Schrédinger flow on the space 
of wave functions: existence, uniqueness, and conservation of norm. The only thing 


! Tn the last chapter, we saw that derivatives can be defined in a weak or distributional sense, so we 
know how A, acts on an L?-function, even if the latter is not differentiable. 
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missing is the connection with the Schrédinger equation, namely the requirement 
that 


_d 
i U(t) = HU(t) 


should hold with H as the Hamilton operator, or for short, the Hamiltonian, on #. 
At this point we must be careful to distinguish between H(A,V) in (14.1) and the 
Hamilton operator H. Why is this? 

In fact, H(A,V) is just a differential operator defined on a subset of differentiable 
L?-functions, e.g., on Cj. In particular, while densely defined, H(A,V) is certainly 
not an operator defined on all of L*. And if we define it on all of L? in the distribu- 
tional sense, then it does not map L? into L”. Hence we have to specify on precisely 
which set we want H to act,” i.e., we have to specify its domain P(H). 

Recall the simple example of the particle on the half-line. There the prescrip- 
tion for “solving” the Schrédinger equation only for x > 0 and for initial data yo 
with supp Wo € (0,ee) does not yield a unique solution. We had to specify bound- 
ary conditions — but this just means that, for each boundary condition a € R that 
we impose [see (12.50)], we pick a domain of definition A(H,) for H. Hence we 
actually talk about different operators H, that all act like —A/2 on the functions in 
their domain, but the domains A(H,) differ. And this is exactly the subtlety we need 
to consider when introducing the Schrédinger operator H together with its domain 
F(H). 

But what exactly are good boundary conditions? Just specifying some domain 
Q(H) will not do anything for us, as the simple example of the half-line shows. By 
just putting 


1 2 


— 72 ee SS 
KH =L ((0, yjade) H 2 dx2 ) 


aH) = {9 CF ((0,~))}, 
everything is precisely defined, but our problem is not solved. For an initial @ € 
Q(H), there is no solution with @(t) € D(A) for t > 0, but many different solutions 
with p(t) ¢ D(A). 

Hence “good” boundary conditions, or more generally “good” domains Y(H), 
should be such that, for any initial state @(0) in the domain, there is a unique so- 
lution @(t) of (14.1) which remains in the domain for all times. In other words the 
unitary group U(t) corresponding to H with domain Z(H) should leave the domain 
invariant: 


U(t)9(A) = WA), forallreR. 


We now turn the desired connection between the unitary solution group U(t) and its 
generator H with domain Y(H) into a definition. By doing this, we adopt a rather 


? Linear operators that cannot be defined on the whole Hilbert space, but only on a dense subspace, 
are called densely defined unbounded operators. If a densely defined operator were bounded, it 
could be uniquely extended to the whole space by continuity. 
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non-standard approach, turning everything upside down, as it may seem. But, at 
least for the present purpose of understanding the time evolution in Bohmian (and 
thus in quantum) mechanics, this is indeed the proper way to proceed. 

To recapitulate, we want to identify operators H and domains Y(H) such that the 
solutions of Schrédinger’s equation are given by a unique unitary group U(t), i.e., 
the unique solution to 


.d 
iG Ul) =HU(t). 


As we shall see, all requirements are captured by the following definition. 


Definition 14.2. A densely defined operator H is called the generator of a unitary 
group U(t) if the following holds: 


i) GH)={p Ee H|t+U(t)@ is differentiable} , 
(ii) iS Ug =HU(t)9, forallo€ A(A). 


Note. The statement that t ++ U(t)@ is differentiable means (using the group prop- 
erty) that there is an element y € # such that 


It then follows from Definition 14.2 (ii) that this y is actually given by —iH@. 
This motivates the notation U(t) =e” (see also Remark 14.3). In particular, we 
can also differentiate within the inner product (actually within any bounded linear 


functional): 
d = U(tho—© 
glee) = tim y| ee 
U(t)ho-o 
=tim( y|POP=? + i1) —(y itt) 
= —(w|i) , 


where the first term is zero by continuity of the inner product. 


The definition is very compact. For example, Definition 14.2 (i) together with the 
group property of U(r) implies the invariance of the domain Y(H), i.e., for allt € R, 
one has U(t) Z(H) = D(H). Moreover, it follows that U(t)H@ = HU(t)@ for all 
t ER, since 


U(t)Ho =U(Ni £u(s)9 ai gOue)  =HUKN9. 


Hence, ||HU(t)¢|| = ||H@|. 
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It is also easy to see that the group U(t) is uniquely determined by H. Let U(t) 
be a unitary group, also generated by H, and consider 


£{(0@ -T@)ol =24 [lel? Rw eIOe) 
= —29 | (—iHU(1)9|O()9) + (U(t)o| ~iHO(0)9)| 


= 2K [HUG W)o) -i(UelHEW9)| 


This is zero if H is symmetric and then we would have uniqueness, since 


| (@@ -G@Jel| =||[U@-TOJe|] =o, 
by (ii) in Definition 14.1. 


Definition 14.3. An operator H is called symmetric (or Hermitian), if (p|Hy) = 
(H@|w) holds for all 9, wy € (A). 


And indeed we have that 
H generator => H symmetric, (14.2) 


since, for 9, y € F(H), it follows that 


0= Sigly) = S(ueluwy) 
(—iHU(1)|U(t)w) + (U()@| -iHU(t)y) 
i(U(t)H@e|U(t)w) —i(U(o|UHHy) 
i((He|y) — (g|Hy)) . 


I 


In conclusion, if the Schrdédinger operator is the generator of a unitary group, then 
we have all we were asking for: existence, uniqueness, and conservation of norm. 
Hence all we are missing is a good criterion to actually determine whether a given 
operator generates a unitary group. 

To this end, first recall (12.47), the vanishing of the flux integral that yields the 
conservation of probability: 


2 
[ha = 5 [ [yin mia.vryyo —winm(a.vyyto" ax 


= - [V-Mar= — [5-40 =0. (14.3) 
Hence, we find the criterion 


[Vv Hav)yar= | vHavyyrer, (14.4) 
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which is just the symmetry we already have. It is clear that this cannot be the end of 
the story. In our example, 


2 


is symmetric [see (14.11)]. But H cannot be a generator because the time evolution 
is not uniquely determined. 

How can (14.4) hold, when (14.3) does not? In the example, we saw that the 
left-hand side of (14.3) can be negative, since there are solutions where the particle 
leaves the half-line at the origin and vanishes. But such solutions are not zero at 
the origin, y(t,x = 0) 40, and hence y(t) ¢ D(H). Our notation in (14.4) was 
sloppy, since (14.4) must hold for y(t). And in general it can happen that y(0) € 
Q(HA) but w(t) ¢ D(A), and then (14.4) is not defined. We thus need to pick the 
domain Y(H) of H so that it remains invariant under the time evolution. The domain 
(Ho) = CF ((0,°e)) for Ho = —(1/2)d?/dx? is certainly not invariant under the 
time evolution, since sooner or later (actually sooner) the solution will reach the 
origin. 


14.2 Self-Adjoint Operators 


So far so good. The invariance of the domain was already part of Definition 14.2, but 
we have come a good way towards uncovering the difference between “symmetric” 
and “generating”. Let us repeat what we have understood so far. If the domain is 
too small, there may be no solutions that stay within the domain. On the other hand, 
if the domain is too big, there may be more than one solution. So exactly how big 
should the domain of a generator be? Let us try to answer that question heuristically 
to begin with. 

Consider a symmetric operator Ho with a — possibly too small — domain A(Ho). 
Let Hmax be the “same operator” on a domain A(Hmax) D Y(Ho) on which it is 
maximally symmetric. More precisely, we assume that Hmax| 2(Ho) = Ho, that Hmax 
is still symmetric, but that there is no larger domain to which Hpax can be extended 
and still be a symmetric operator. Now let us assume that, at least for small times 
|t| <t and y(0) € (Ap), the Schrédinger equation 


has a solution y(t) with y(t) © D(Amax), but not necessarily y(t) € D(Ho). But 
where can y(t) go when it leaves (Ho)? Because of the symmetry of Hmax, and 
because Hinax extends Ho, i-e., (Hmax) D Y(Ho) and Amax| 94H) = Ho. we have, 
for all p € F(Ho), 


(@|Hmax W(t) = (Hmax®| ¥(t)) = (Hog| w(t) - (14.5) 
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Hence y(t) always ends up in the domain of the adjoint operator Hj, which we now 
define with (14.5) in mind. 


Definition 14.4. Let H be a densely defined linear operator on #. Let Z(H*) be 
the set of all y € # for which there exists an 1 € # such that 


(vig) =(n|p), — forallp ¢ AH). (14.6) 


For each yw € D(H"), we define H* y = 7 and call H* the adjoint operator to H. 
Hence we have (w|H@) = (H* |) for all p € D(A) and we G(H*). 


Since Z(H) is dense, 7 in (14.6) is unique if it exists, and the operator H* is 
thus also uniquely defined. The star notation is reminiscent of complex conjuga- 
tion. This is intended, since the adjoint of a complex number c, or more precisely 
of the operator cl, is just the complex conjugate number c*. While Z(H*) need not 
be dense in general, for symmetric operators H, the adjoint H* is again a densely 
defined operator. This is because H is symmetric (see Definition 14.3) if and only if 
(A) C Y(H*) and He = H*¢@ for all 9 € F(A). We now formulate the result of 
our heuristic argument (14.5) as a theorem. 


Theorem 14.1. H is the generator of a strongly continuous unitary group if and 
only if H = H*, i.e., H is symmetric and Q(H) = G(H*). Such an operator H is 
said to be self-adjoint. 


This is the main content of Stone’s theorem [1]. Stone’s theorem states in addition 
that any strongly continuous unitary group has a generator, which by the above 
statement must be self-adjoint. Self-adjointness is a property of operators that is of 
general mathematical interest. As such, it is usually considered independently of 
the property of being a generator, and this is why we did not introduce only the 
terminology “self-adjoint”. However, in the following, we will use it synonymously 
with “generator”. 

The following argument shows once again why we expect any generator H of a 
unitary group to be self-adjoint. Let y € Y(H*). Then we have, for all p € D(H), 


=iF (vue) =i (U(-Nvie)| _ 


(vite) = (wl uo), 


Since y € Y(H*), there exists n € # such that (w|H@) = (n|@) and thus 


pg = ATI) 


i.e., U(t)y is weakly differentiable at t = 0. If we assume for the moment that this 
implies also strong differentiability of U(—r) y, then it would follow that y € D(H), 
and therefore Y(H) = F(A"), i.e., H* =H. 

In order to close the gap in the above argument and to understand that the con- 
verse is also true, ic., H self-adjoint — > H generates a unitary group, we need 
to develop the idea of self-adjointness. In particular, we need to find a convenient 
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criterion for self-adjointness, since explicitly determining the domain of the adjoint 
operator is not practicable in many cases. 

The way we would like to proceed for the Schrédinger operator is the following. 
We start with a simple set of “nice” functions, e.g., Cj IR"), on which the operator 
H(A,V) is symmetric. Then we extend this set further by adding, e.g., all twice 
differentiable functions @ with H(A,V)@ € L’, and so on. But we have already 
seen that, if we add too much, the operator may no longer be symmetric. On the 
other hand, if the domain Z(H) is too small, then D(H) is too big, i.e., A(H*) D 
Q(H). One can study the question of “balancing” the domains most conveniently 
by considering the graph of an operator. 


Definition 14.5. The graph of an operator H is the linear subspace 
T(H) = {(9,H@9)|9€ DA) C HOH. 
We say that H is closed, if (H) is a closed subset of #8 #. 
Clearly two operators are the same if and only if their graphs are the same. 


Definition 14.6. Let H; and H be operators on .#. If [(H) C (M1), then A; is 
an extension of H, in short H C Hj. An operator H is called closable if it has a 
closed extension. The smallest closed extension of a closable operator H is called 
the closure of H and denoted by H. Clearly, "(H) = I'(H). 


For symmetric operators H, we have seen that the adjoint H* is an extension of H, 
so in acertain sense Y(H*) contains all that was missing in Z(H). It is not therefore 
too surprising to find that the following theorem holds. 


Theorem 14.2. Let H be a densely defined operator on #. Then H* is closed. 


The proof uses concepts that turn out to be useful for the following. According to 
Definition 14.4, we have 


(w.n) €T(H") => (wlH@) =(n|g), forallp ¢ (A), 
<= (wlH9)—(n|p)=0, forallp € AH), 
or equivalently, with the inner product on #7 6 # [see (13.29)], 
((w,n)|(—H@,@)) =0, forall g ¢ AH). (14.7) 


Geometrically, this means that (y,1)) is orthogonal to (—H@,@), which is essen- 
tially the graph of H. Introducing the unitary map 


W: #0 3 HOH, 
(9, Ww) +> W(9,W) =(-V,@), 


equation (14.7) says that (w, 7) € I'(A*) if and only if (y,7n) € W(I(H))~, Le., 
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T(H*)=W(r(A))~. (14.8) 


Since orthogonal complements are always closed [see (13.25)], '(H*) is closed. 


Now, if H is symmetric, H is closable since H C H*. But H* need not be the closure 
H of H, and H is not necessarily the adjoint of H”. 


Corollary 14.1. Let H be densely defined and closable and assume that H* is also 
densely defined. Then H = H** := (H*)* and (H)* = H** = H*. 


Proof. It is easy to see that, for arbitrary subspaces WC # © #, one has 
W(.@*+) =W(.@)-. From (14.8), we conclude that 


(H**) = W(P(H*))* = w(w(ri)*) aay ((w(reny*)”) 
=w(w(rca)*)*) =w (w((r(H)4)*)) . 
For general subspaces .@, we have 
(My =A 


(not .@ but its closure, since orthogonal complements are always closed). Hence 
with W? = —l, we finally get 


By the same reasoning, 


since H™ is closed. 


For symmetric H, we thus have 
(14.9) 


and, if H is self-adjoint, 
H=H“*=H". (14.10) 
From (14.9) (i) we see immediately that, if H* is symmetric, i.e., H* C H**, then 
He =H HH, 


and thus H* = H is self-adjoint. This motivates the following definition. 
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Definition 14.7. A symmetric operator H is essentially self-adjoint if its closure H 
is self-adjoint. 

Corollary 14.2. A symmetric operator H is essentially self-adjoint if and only if H* 
is symmetric. Then H = H*. 


Proof. We have already shown that H* symmetric = > H essentially self-adjoint 
and H = H*. For the other direction, let H be essentially self-adjoint, i.e. H = A’. 
With (14.9) (ii), we have H’ = H%, and with the equality H = H** in (i), it follows 
that H* = H**. Thus H™ is self-adjoint and in particular symmetric. 


Essential self-adjointness is sufficient for characterizing a self-adjoint operator 
uniquely. We only need to know a domain of essential self-adjointness or a core 
of a self-adjoint operator H, where Y C Y(H) is a core for H if 


H|g =H, 
i.e., if the closure of the restriction of H to J is again H itself. 
Let us examine the concept of self-adjointness for two simple but very instructive 
examples. The free Hamiltonian 


2 


ae 


with domain Z(H) = Cp (R) , (14.11) 
is essentially self-adjoint. For @ € Cy’ and y € L’ such that y’ € L’, we have 


(y|Ho@) = | vioa=-[ votu= [wow . 


/ 


= -f wi odx+ vol -ve = (14.12) 


where the derivatives on y™ are taken in the distributional sense. From this we read 
off the following facts: 


1. Ho is symmetric on Cj (IR), since for @ € Cy the boundary terms vanish, and thus 
for w, p € Cj’, the right-hand side equals (How |) ;2. 
2. The domain of the adjoint operator Hj is 


D(H) = {yeL|y" el} = {yer?| KPGe Ls ; 
since exactly for y € (Hj), we have 
-f Vv" edx= (-w"l@)p - 


3. Ho is symmetric and thus, according to Corollary 14.2, Hj = H is self-adjoint. 
Symmetry of Hj follows most directly in the Fourier representation, 
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(wIHE 0) = (WAR) = (GRO) = (PHO) = (HG Ig). 


Alternatively, we can show symmetry of Hj along the lines of (14.12). From 
Remark 13.8, we take 


wer = kfeL’, 
and since k? € L? and % € L’, we have 
(WIRY) = (kk) = |W, 


i.e., kW € L’ and therefore y’ € L”. From this we get that w, y’, yw” € L’ as well 
as w(x) —> 0 and w'(x) —> 0, for |x| —> ee. Hence, the boundary terms also 
vanish for y € D(H5). 


On the other hand, 
: 2 
Hy =—s3 with domain Q(Ho) = Co ((0,2°)) Cc L7(R*, dx) (14.13) 


is not essentially self-adjoint. For p € Cf ((0,e¢)), one has 
[ Ww Hog dx = -{ wp" dx 
0 0 


(14.14) 


--f vy" pdr+ v"o| —y'@'|. 
0 0 0 


As in the previous example one reads off directly that: 

1. Ho is symmetric on Cf ((0,e2)) . 

2. The domain of Hy is A(Hé) = {y € L(Rt,dx) | y"€ 1(Rt dx)} ; 

3. Hj is not symmetric on Y(H>}), since the boundary terms at 0 no longer van- 


ish. (The boundary terms at co vanish by the same argument as in the previous 
example.) Hence Ap is not essentially self-adjoint on Cy (0, 00). 


The problem in this example is clearly that A(H;) is too big for —d* /dx? to still be 
symmetric. By inspecting (14.14), we see that we can shrink Y(Hj) by enlarging 
Q (Ho). But we have to do it carefully in order not to destroy the symmetry of Ho. In 
this simple example, the solution is easy to guess: if the boundary terms in (14.14) 
at 0 do not vanish individually, they must at least cancel out. This happens if, for 
a €R, we define the operator 


with domain 


P (Hoa) = {9 € P(R*,dx) |g" € L?(R*, dx), '(0) = a9(0)} . 
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And indeed, Ho,q is self-adjoint [see also (12.50)], since, for p € D(Ho.a), we have 


[ vitoapax =~ [° v"eax—y"(0)9(0) + ¥"(0)9'(0) 


== [ v"oax—w"(0)9(0) +av' (0) 9(0). 


Now w € Y(H¥.,) if and only if the boundary terms vanish and y” € L?(R*,dx). 
However, the boundary terms vanish if and only if y/(0) = ay(0), and therefore 
(His) = D(Hoa) 

In general, however, one cannot explicitly compute the domain of the adjoint op- 
erator, and what we need is an abstract and at the same time accessible criterion for 
self-adjointness. To recapitulate, in order to have H = H* for a symmetric operator 
H, necessary and sufficient conditions are that H is closed (since adjoints are always 
closed) and that H* is symmetric. Since symmetric operators have real eigenvalues, 
a necessary condition for H* to be symmetric is that neither i nor —i are eigenvalues. 
The following theorem states that this last condition is indeed sufficient to ensure 
self-adjointness of H. 


Theorem 14.3. Let H be a densely defined symmetric operator on a Hilbert space 
HH. Then the following assertions are equivalent: 


(i) His self-adjoint. 
(ii) His closed and Ker(H* +i) = {0}. 
(iii) Ran(H+i)=#. 


Proof. (i) => (ii). As argued before, if H is self-adjoint then H* = H is symmetric 
and thus only has real eigenvalues. The argument for this is the same as in linear 
algebra. Let A be symmetric and Ag = 1@ for g € (A) \ {0}. Then, 


A(G|P) = (9|A) = (@|A®) = (AQ|@) = (AQl@) 
= A*(9|@) . 


So Ker(H* —i) = {0}. Closedness of H* and thus of H was shown in Theorem 14.2. 
(ii) <=> (iii). We formulate as a lemma. 


Lemma 14.1. Let T: # D D(T) — # be densely defined. Then, 


1. Ker(T* =i) = Ran(T +i)+ and hence, in particular, 


Ker(T* Fi) ={0} <= Ran(T Hi) is densein # . 


2. If T is closed and symmetric, then Ran(T +i) is closed. 


3 We recall the definitions of the kernel and the range of a linear operator, which are the subspaces 
Ker(T) := {y € D(T) | Ty =0} and Ran(T) := {9 € #|y =Ty for some y € D(T)}. 
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For (1), note that (7 +i)* = T* —i and therefore 


y € Ran(T +i) => (y|(T+i)9) =0, Vee D(T) 
<> we D(T*) and (T* —i)w=0 
<> we Ker(T* —i). 


The other sign is treated analogously. 
For (2), one uses the fact that, for symmetric T and @ € D(T), one always has 
(9,T@) € R. Thus 


I(T +ie|l? = IT el? + oll? + 2Re(ig, Te) (14.15) 
= |Toll’+|lell? = lel. 


which implies that T +i is injective and (T +i)~! : Ran(T +i) + D(T) is bounded. 
Now let y;, be a sequence in Ran(T +i) with y, — w. Then 9, := (T+i)7!w, 
converges to @ = (7 +i)~!y. Since the graph of T is closed, the graph of T +i 
is also closed, and it follows that (g, y) € I(T +i), and thus y € Ran(T +i). So 
Ran(T +i) is closed. 

It remains to show (iii) == (i). We have already shown that H C H* for symmet- 
ric operators [see (14.9)], and it remains to show Z(H*) C Y(H). So let ye D(H"). 
Since Ran(H — i) = #, there exists p € Z(H) such that 


(H—i)p =(H* —i)y. 


But from H C H%, we get 


(H" —i)9 = (H" —i)y or (H*-i)(g—y) =0. 


By Lemma 14.1, it follows that y= 9 € D(H). 


Since a self-adjoint operator is usually characterized by providing a core, the fol- 
lowing corollary is very useful. 


Corollary 14.3. Let H be a densely defined symmetric operator on a Hilbert space 
HE. Then the following assertions are equivalent: 


(i) His essentially self-adjoint. 
(ii)  Ker(H* +i) = {0}. 
(iii) Ran(H +i) is dense. 


While (ii) (iii) is again Lemma 14.1, the equivalence (1)<=>(ii) follows from: 
H essentially self-adjoint <=» H=H™ self-adjoint 
<= H™ closed and Ker(H*™* +i) = {0} 
<=> Ker(A* +i) = {0}. 
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Remark 14.1. Instead of only looking at the points +i in Theorem 14.3 (ii) and (iii) 
and Corollary 14.3 (ii) and (iii), we could refer to allA € C\R, ie., ImA #0. One 
can see this from the proof, or just convince oneself that 


H self-adjoint <> aH +b self-adjoint fora,b€ Ron F(A). 
a 


If H is a differential operator, as in the Schrédinger case, i.e., H = —A+ V, then 
with Corollary 14.3 (11), we can show that an operator is not essentially self-adjoint 
by looking for solutions of the corresponding stationary Schrédinger equation 


[—Ar+V(x)] w(x) = iy) or [—Ax+V(x)] yw) = -iv(2) 


within the domain of the adjoint operator. Of course, we must allow not only for 
smooth solutions, but also for solutions in the distributional sense. However, in con- 
crete applications, the potentials are smooth away from singular points and the cor- 
responding distributional solutions are also smooth away from the singular points. 
In example (14.11), we can either conclude from the symmetry of Hj that Ho is 
self-adjoint, or we can use Corollary 14.3, since 


9(Hj) = {p EL"| |kPGEL}, 


and a linear ordinary differential equation —@” = ig has only the two linearly inde- 


pendent solutions 
awe, 
ex =X! , 
Pe va 


which are not in L7(IR,dx). The same holds for the solutions of —g” = —ig, and 
hence Ker(Hj +i) = {0}. 
On the other hand, in (14.13) the solutions of —@” = ig are again 


i 
exp (2) ; x € (0,¢) , 


where now, on the half-line, 


that is, 


Therefore, Ho is not essentially self-adjoint, but we expect dim Ker(Hj — i) = 
dim Ker(Hj +i). 
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Remark 14.2. Deficiency Index 

The dimensions dim Ker(H* +i) = n+ of the kernels are called deficiency indices 
of the operator H. When does a symmetric operator have self-adjoint extensions? If 
and only if the deficiency indices are the same, i.e., ny = n_. Ifn; =n_ 40, then 
H does indeed have infinitely many self-adjoint extensions, more precisely a family 
of self-adjoint extensions parameterized by n, real parameters. This is covered by 
von Neumann’s theory of self-adjoint extension (see, for example, [2]). Like most 
things, this is not surprising, since for n, = n_— one can balance the mass-loss so- 
lutions (+1) with the mass-gain solutions (—i). It is also clear that this is connected 
to time-reversal invariance. If H commutes with complex conjugation, i.e., if the 
Schrédinger equation is invariant under time reversal, then one has ny = n_. | 


Remark 14.3. On Self-Adjointness and Generators 

As announced previously, we would like to make it at least plausible that the re- 
verse direction of Theorem 14.1 should hold, i.e., that self-adjoint operators should 
generate unitary groups. First of all, in the case of a bounded self-adjoint operator 
H, one can define the corresponding unitary group directly through the convergent 
exponential series 


and just check by explicit computations that this really defines a unitary group in 
the sense of Definition 14.1. 

In order to understand why, for unbounded operators H, the more technical no- 
tion of self-adjointness becomes relevant, we explain how Theorem 14.3 and, in par- 
ticular, Ran(H +i) = # [which is as good as Ran(H +id) = # withO#A ER] 
enters into the construction of the unitary group U(t) = e~”. In the case of un- 
bounded H, we can no longer define the exponential e~” through the series. How- 
ever, an alternative approach which is closer to the idea of a group is 


itH \” tH a 
e “4 — lim (1 + mm) = limi” (4 = ) , (14.16) 
n 


nc n n—0o 


where the thinking behind the negative exponent —n will become clear in a moment. 
With Theorem 14.3, we have Ran(a@H —i) = # and Ker(@H —i) = {0} fora ER. 
Hence (aH — i) is invertible on # and 


2 
||(@H —i)@||” = o° |||? + |lell? = lle’, 
i.e., for @ = (aH —i)~!y, one has 
|| (aH —i)~"yl| < Iv. 
With Ran(@H —i) = #, it follows that 


(aH —i)"!: 2 — Q(B), 
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and therefore (tH /n —i)-" : # —>+ Y(H) is bounded for any n € N. Hence, the 
product on the right-hand side of (14.16) defines a bounded operator with norm 
bounded by one for any 7 € N, and we can expect the limit to exist. Oo 


14.3 The Atomistic Schrodinger Operator 


Let us now apply these ideas to the atomistic Hamiltonian with Coulomb potential. 
We consider 


He=-A--, on 9(He) = Ce (R?\ {0}) , 


where |x| = r and we put the electric charge equal to 1. Why Cj (IR? \ {0})? The 
reason is that r = 0 is clearly a singular point, and the Schrédinger equation can 
only hold for r > 0. Recall the example of the particle on the half-line. There the 
origin could not be crossed. But this was in one dimension, and here we have three 
dimensions, so taking out a single point could be less problematic. However, this is 
true only from dimension four upwards, and we must still be careful here. 

One can understand the situation by looking at the simpler problem Hp = —A 
with Y = Cy (R3 \ {0}). Even the free Laplacian without any singular potential is 
not essentially self-adjoint on a domain with a missing point. To see this, according 


to Corollary 14.3, we need to find solutions of Hj ws = Fiy+. We know that, on a 
suitable domain, we have Hj = —A, and thus we first look for solutions of 
— «Wa (x) = Five (x) ) (14.17) 


for which we will show later that y+ € D(H} ). Equation (14.17) is most conve- 
niently solved in spherical coordinates, and seeking spherically symmetric solu- 
tions, one arrives at 


¢ 2a 
(-3 ~ “5 we (r) = Fiye(r) - (14.18) 


e7 (lti)r 


ya(r) = 


Clearly, ws € L(IR3), since the 1/|x| singularity is square integrable in three space 
dimensions. However, we still need to show that y+ are in the domain of Hj. This 
follows from the usual computation using integration by parts. Let @ € D(Ho). Then 
(0) = 0 and the boundary terms vanish: 
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(WH) 12(R3) rs — [ arvels)rola) 


ee @ 2d ii 
= -f aa | rdrWs or + + 740) 9(r,@) 
S2 0 


de rdror 
= d 2d : 
eee 2 gee wife Se ee 
= [02 | Par |(S+25) vet] 9(r,@) 


= (FiWs|0) 2193): 


x 


whence yw. € D(H) and Hj ws. = +iy+. Thus not even Hp = —A is essentially self- 
adjoint on the domain Y = CF (IR? \ {0}). It has many different self-adjoint exten- 
sions.* There is of course no problem with that. It is a matter of physics to select the 
correct “physical” extension. We would have come to exactly the same conclusion 
with the Coulomb potential added to Hy. Hence the atomistic Schrédinger operator 
Hc with P(Hc) = Cz (R? \ {0}) is not essentially self-adjoint, but has many differ- 
ent self-adjoint extensions. But the different self-adjoint extensions have different 
eigenvalues and generate different time evolutions, i.e., they correspond to different 
physics. 


Remark 14.4, About Self-Adjoint Extensions 

What should one do now? For the Schrédinger operator with the Coulomb potential, 
we have the same problem as in the simple example (14.13), where we were able 
to select self-adjoint extensions by posing proper boundary conditions. In principle, 
we should be able to do the same in the case of the Coulomb potential. However, 
this will be more complicated, since the potential is singular at the boundary. The 
origin is a singular point of the potential, where the wave function need not and 
will not be differentiable. In spherical coordinates, the eigenvalue equation for the 
Coulomb problem reads [see (14.18)] 


eel a +t) w=Ew. (14.19) 


The “physical” ground state eigenfunction of the hydrogen atom corresponding to 
the ground state energy Ey = —1/4 is 


vol) <exp(—L) , 


While Wo is not differentiable at the origin, it is bounded. And being bounded is 
also a sort of boundary condition. To see that there are also unbounded solutions of 
(14.19), note that y solves (14.19) if and only if f = rw solves 


4 One of these extensions with Y(Ho) = {pe | |k/>@ € L?} is the generator of the free time 
evolution. The other extensions correspond to so called 6-interactions, i.e., to Dirac delta potentials 
of various strengths at the origin. 
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—f"—-f=Ef. (14.20) 
r 


With each solution of this equation, 


is also a solution of (14.20). In particular, f; = F(fo) yields a solution f;/r of 
(14.19) which diverges at the origin as y ~ 1/r, but is still square integrable at the 
origin. However, since f; ~ e” for r — ©, one needs to go to F(f1) to obtain a so- 
lution with good decay properties at infinity, but which is still nonzero at the origin. 
The corresponding singular but square integrable y-functions are usually excluded 
as being unphysical in the physics textbooks, because they are not defined at the 
origin (like the Coulomb Hamiltonian itself!). Note that, being unbounded, they too 
cannot be contained in the domain Y(Ho). As the following clever computation 
shows, each p € F(H) = {@ € L’| |k|°@ € L*} is bounded. With 


(x) = 2m)? feMG(K)AE, 


the usual trick yields 


lols < (223? / 1G (k) [a3 


= 2A)? [ OWI + Pe 
, 1/2 
= ¢| fioaoPa+eyar| 

1/2 
< vic [ipa Pc+4*°e| 


= (\l@l?+ ool’) 


(#4) 
< C(||@l|+ lAogll) - (14.21) 


In (x), we used the Cauchy—Schwarz inequality with [ d?k/(1 +k)? <o, and in 
(**), the Plancherel equality (13.15). This shows that the self-adjoint extensions of 
the Coulomb Hamiltonian with such unbounded eigenfunctions cannot be defined 
on the natural domain A(Ho) 7 A(V). [We will soon see that this is actually equal 
to (Ho).] | 


Back to the general question. It was Kato’s idea to ignore the problem of boundary 
conditions and to focus on the domain. The natural domain is clearly (Ho) A(V), 
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and on that domain the “physical” Coulomb Hamiltonian should be self-adjoint. We 
first convince ourselves that W/V) C D(Ap). 
To this end, we split the potential into V = c/r = V; + V2, where ||Vi || < € and 


|| V2 lo = sup |V2(x)| =a<e, 
that is, 
c C 
Vi =~" Xrce/ane2}» V2 =~ A fr>e/4me2} - 
With this splitting and for p € D(Ho) with (14.21), we obtain 


IVell < Mell +llMell < [Ville lle + IlVall-llel 
< eC(|o|| + |Hogl|) +alloll . (14.22) 


We can now finish the argument. We try to interpret V(r) as a “perturbation” of the 
free Hamiltonian Hy = —A, where the domain D(H) of the self-adjoint operator 
Ho should determine the “correct” self-adjoint version of H = Hy) + V. Since the 
domain A(V) of V (as a multiplication operator) contains A(Hp), one can define 
the operator H = Hj) + V on A(Ho). We now continue abstractly. For H = Hj + V 
to be self-adjoint, according to Theorem 14.3, we must have 


Ran(Mpb +V+A)=#, 


for A € C\R. This follows if (Hp +V +A)~! exists as a bounded operator on #. 
But, 


1 1 


= 
MoV A) ~ 1+4V(Ho+4)-! Ho +a 


= [1+V(Ho+4)7'] "(Ho +4), 


where (Ho + A)~! is a well defined bounded operator, since Hp is self-adjoint and 
therefore 


Ran(Hp +A)! = #, Ker(Ho +A) = {0} . 


Bearing in mind that 


[1+V(Ho +A4)~'] ~ can be represented as a geometric series if 
\|V (Ho +ia)' || <1, 


that is, if 
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IV (Ho +4) el < lel 


holds for all 9 € #. (We will discuss this in more detail in the next chapter.) Now 
put yw = (Hy +A)~'@, so that 


IV < ||(o+A)y|]. 
Since 
||(Ho +A) yl)? = |Hoyl? +47 yl? 
we can equivalently require that 
IV? < 6 Howl? + Bly (14.23) 
for all y € Z(H) with & < 1. Finally, observe that (14.23) is equivalent to 


IV y|| < o||Hoyl| + Billy, (14.24) 


for all w © D(Ho). The fact that (14.23) = (14.24) is obvious, and (14.24) => 
(14.23) can be seen with @ = (1+¢)a? and B? = (1+1/e)B? for arbitrary € > 0. 


Definition 14.8. Let Hp with domain Y(Hp) be self-adjoint, and let V with domain 
QY(V) be symmetric. One says that V is relatively bounded with respect to Ho (or 
more briefly, Ho-bounded), if (Ho) C A(V) and if (14.24) or equivalently (14.23) 
holds for some a, B € R*, or &, B € R°. The infimum of all admissible @ is called 
the relative bound and agrees with the infimum of all admissible @. 


From the above argument we obtain the following theorem due to Kato. 


Theorem 14.4. Let Hy on (Ho) be self-adjoint. Let V be Ho-bounded with relative 
bound a < 1. Then H = Hj +V is self-adjoint on G(H) = D(H). 


According to (14.22), the Coulomb potential is Ho-bounded with relative bound 
equal to zero and if we refer to the Coulomb Hamiltonian we mean exactly the 
one defined on Y(Ho) as a self-adjoint operator in Theorem 14.4. And if physicists 
compute the spectrum of the Coulomb Hamiltonian, they only take the eigenvalues 
with bounded eigenfunctions, i.e., they consider H on Z(H) = D(H). 


References 


1. M. Reed, B. Simon: Methods of Modern Mathematical Physics I: Functional Analysis, revised 
and enlarged edn. (Academic Press, San Diego, 1980) 

2. M. Reed, B. Simon: Methods of Modern Mathematical Physics I. Fourier Analysis, Self- 
Adjointness (Academic Press [Harcourt Brace Jovanovich Publishers], New York, 1975) 


Mathematical Physics 


Chapter 15 
Measures and Operators 


We now discuss the operator calculus anticipated in Chap. 12. This calculus allows 
us to do computations within quantum equilibrium in a very efficient and concise 
way. The core element is the spectral theorem for self-adjoint operators, which es- 
tablishes a precise connection between operator-valued measures and self-adjoint 
operators. 

In (12.16a), we reduced the quantum equilibrium statistics to the map 


yo PH(RA)) =f Me(sy)PPardy 


= (YO@|U(T)*Yp-1(4)U(T) Yo) 
= Bay, y) ’ (15.1) 


where y is the effective wave function of the system, U(T) is the unitary time 
evolution, y ® ® is the initial condition, and is the wave function of the entire 
system (including possibly a piece of apparatus) at some large time T (when the 
experiment ends, e.g., when the result is displayed). Most importantly, the function 
F is a coarse-graining of configuration space. It maps microscopic configurations 
of the entire system to the macroscopic outcome of the experiment, e.g., to pointer 
positions. If A C R” is the range of F, i.e., the set of possible outcomes, then for 
A C A the probability of finding an outcome in A is given by the quadratic form 
Ba ( WY, y). 

According to Theorem 13.4, we can associate a bounded linear operator P, with 
the sesquilinear form B4(y,@) such that 


Ba(y,) = (Pay|9), forally,p¢ # (=L?). 


With B4(,@) > 0 for all @, we also have (P4@|@) > 0 for all @, and such operators 
are said to be positive. Positive operators on complex Hilbert spaces are self-adjoint. 
From 


0< (Pe|@) = (g|Pe)* = (9|P@) , 
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we see that P is symmetric with respect to the diagonal, and polarization yields 


(Pe|y) =(9|Py) Vo,we XH. 


Now A C A, or more precisely A € A(A), if F is a Borel measurable function. 
Hence there is a family of positive operators (P4) Aew(A) Which has the properties 
of a probability measure, but with values in the positive operators instead of (0, 1]. 


Definition 15.1. Let A C R” be a measurable set with the corresponding o-algebra 
BA). A family (P,)4ea(a) Of bounded linear operators Py € #(#) is called a 
positive operator valued measure (POVM) if it has the following properties: 


(i) Each Py is positive. 

Gi) =Pyo=0, PA =ly. 

(iii) If (Aj) jen are pairwise disjoint measurable sets, i.e., A; € A(A) and also 
AjNA; = 9 fori ¥ j, then 


N 
Pua; =e ’ 
j= 


where s-lim is the strong limit, i.e., 


No 


N 
lim (Pua, — > P,) | =0, foralwe #. 
j=l 
Remark 15.1. Integration with Respect to a POVM 
For each 9 € #, it follows that 
P?: BA) ams [0, e) ’ 
At— P9(A) = (@|P4Q) , 


defines a bounded positive Borel measure on A, which we can use to integrate 
bounded measurable functions on A, and we thus define 


(o|| f rayari] o) == f rayareaay. 


In order to define the operator {, f(A)dP, using the representation result of Theo- 
rem 13.4, we need to introduce the complex Borel measures 


PY? : BA) — C, 
A PHA) == (WIRA@) 


and define 


(v|[ f ravar| o) = J rayarveray. 
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Note that according to the polarization identity, the complex measure P”? is just 
the following sum of positive measures: 


Pp? — p¥+? _ p¥-? 4 jp¥—i9 _ jpytig | 


and this provides a way of defining integration with respect to such a complex mea- 
sure. a 


The simplest example of (15.1) is Yr = yw and F = id, which leads us to the position 
POVM (04)4ca(R) On L?(R",d"x) [see (12.21)]. It is given by multiplication by 
the characteristic function, 


Ff A, 
er ere pie ie 45.2) 


0, otherwise . 


The probability for the configuration to be in A is 


oM(A) = (wiOav) = | |wo)Paaaex=[|woyParx. 153) 


It is easy to see that the properties (i-iii) in Definition 15.1 hold for O4. With the 
position POVM, we can use (15.3) to compute the probability distribution of the 
system configuration if its wave function is y. The position POVM has an additional 
property, which was mentioned in Chap. 12. In fact, it satisfies 


Definition 15.1 (iv) PaynAy = Pa, Par ; 


since O4,nAy = XAyNAy = XA, XA, = Oa, Ory, and, as a consequence, in particular 


Definition 15.1 (ivy’ Pa? 


This structure plays an important role, as it is responsible for the quantum formal- 
ism. A linear operator P that satisfies P* = P is called a projector. If a projector P 
is also self-adjoint, then P is positive: 


(~|Pp) = (g|P’¢) = (Pe|Pg) = 0 


To see that not all projections are self-adjoint, consider the projection onto a one- 
dimensional subspace of R? along a family of parallel lines that are not orthogonal 
to the subspace (see Fig. 15.1). 

The self-adjoint projections are therefore called orthogonal projections. They 
satisfy P? = P and P* = 


Definition 15.2. A family (P4)4ea(a) of bounded linear operators Py € #( #7) is 
called a projection-valued measure (PVM), if it satisfies Definition 15.1(i—-iv), or 
equivalently, Definition 15.1(i-iii) and (iv’). 
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Fig. 15.1 Projection (||@|| = 1) 


Hence a PVM is a POVM in which all operators are orthogonal projections. 


Remark 15.2. Indeed (15.1)(i-iii) together with (iv’) implies (iv). This follows from 
PaPa\a = 0 for A C B, which can be seen as follows. First, 


Pg = Pg = (Pa + Pp\a)” = Pa + PaPp\4 + Pp\aPa + Paya 
= Ppt PaPpya t+ PayaPa 


implies that 
P4Ppya + PayaPs = 0. 
Multiplying this by P, once from the left and once from the right yields 
PaPp\a = PavaPa = —PaPpyaPa , 
and therefore P4Pg\4 = Pp\4Ps = 0. | 


We will return to PVMs shortly, but first let us recall our example POVM (12.39) 
for a position measurement with some uncertainty. We defined 


On = / P(y—x)ya(y)d’y , 


that is, 


Ox: Qr—> [pw-xery p(x) , 
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where 

= a 

O7 # OA > 


whenever p(x) 4 6(x). 

After this example let us emphasize once more that the statistics of any experi- 
ment in the sense of the sequence (15.1) are described by a POVM. These are all 
experiments where in the end the result can be read off from the configuration of the 
system or some apparatus. Therefore it might be surprising to find that these POVMs 
play little or no role in theoretical physics. The reason is that PVMs give rise to an el- 
egant mathematical formalism, while POVMs give rise to nothing. They are merely 
abstract descriptions of the statistics of some measurement, and one could just as 
well do without them, 1.e., just stop after the first equality in (15.1). The justification 
for the abstraction lies in the idealized description of the measurement process, as 
in (12.3) to (12.9). The book-keeping operators which stem from special POVMs, 
namely the PVMs, give rise to an elegant and compact formalism — a textbook for- 
malism — for computing expectation values, variances, and higher moments. We 
will shortly discuss three such operators which arise in the relevant measurement 
situations. 

But before that, we should emphasize that the idea of a measurement where one 
can “stick in” an arbitrary wave function and always get out some measurement 
result in the form of a pointer position is completely unrealistic. Any experiment 
will only lead to sensible outcomes for a quite special class of initial wave func- 
tions. Take for example scattering experiments. Here we are only interested in wave 
functions where the particle (that is the system) moves towards a target and scatters. 
Most initial wave functions will never reach the region of the target and will never 
be detected. They are irrelevant. The moral is that experiments are designed and 
conducted for a small class of special initial data. As a consequence, the general 
structure of a POVM or PVM defined on the full Hilbert space is quite uninteresting 
and irrelevant from the point of view of the physics. 


15.1 Examples of PVMs and Their Operators 


We have already noted that quantum equilibrium, i.e., Born’s statistical law, leads 
trivially to the position POVM. For a single particle, we have (O4) Acw(R3) and 
can use O, to compute the moments E?(X”) of the position distribution. Since 
the position x € R? is a vector, we need to specify what we mean by x”. Most 
straightforwardly, we take x” as meaning any of the possibilities 


x" =x, withI+kim=n, LkmeéN, 
for x = (x),X2,x3), or any linear combination thereof. 


Approximating the integral with sums over a disjoint decomposition A; of R*, 
we write formally 
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= im (| Yxtone) 
= tim (0|( 20s) °) , (15.4) 


where we have used Oa, Oa; = & jOA)- Hence we obtain 


20(x") = (0|( [xa"ox) 9) = (oiR"). 


This motivates the definition of the self-adjoint operator X, the position operator. We 
will discuss its domain later on:! 


a= [ x80. (15.5) 
and we have 
z= ies LO,. (15.6) 


Remark 15.3. Vector Operators 
When speaking of the position operator x, we always have in mind the family 
(X1,%2,%3) of commuting self-adjoint operators (see Sect. 12.2.2) 


%= | xid0,, 


with 
[xi,X 5] = XjXj — XjXIi = 0, 


as follows from (15.5). The position operator X is thus a map 


' We say this rather lightly here, although we know that, for an unbounded operator, the domain 
plays an important role. This is because, for multiplication operators like x, the domain of self- 
adjointness is straightforward: it is just the maximal domain. 


Mathematical Physics 


15.1 Examples of PVMs and Their Operators 305 
x: 7(R?) — L’(R?) @L’(R’) @L’(R’), 

7 x W(x) 

w(x) —> x(x) = | 2 w(x) 

x3 W(x) 


This is often called the position representation of the position operator. We shall say 
more about that later. | 


The abstract one-to-one correspondence (15.5) between self-adjoint operators and 
PVMs is the content of the spectral theorem, which we will discuss in detail later 
on. 


15.1.1 Heisenberg Operators 


Let us move to the second operator. It comes out of the sequence (15.1), where 
again the apparatus plays no role, i.e., we just look at the system at time T = 1, viz., 
Yr = w; und F = id. Then for A € &(R*) we have 


PY (A) = f IyenPas 


= [v'eNraeovexnes 
= (U(t)y| OU(t)y) 
= (w|U(t)* Ost) W) , 
with the POVM 
Oa(t) = U(t)*O4U(t) . 
Not only O4, but also O,(t) is actually a PVM: 
Oa(t)* =U(t)*O4U(t)U(t)* OU (t) = U(t)*O4O4U(t) = U(t)*O4U(t) . 
And as in (15.5), we can introduce the corresponding self-adjoint operator 
X(t) = [xP O(1) =U"(NRUE), (15.7) 


the so-called Heisenberg position operator at time f. 

The Heisenberg position operator at time f captures the statistics of the particle 
position at time f. But we should not forget that this is just a way of rewriting equiv- 
ariance. The (Bohmian) position X(t, x) of the particle at time ¢ is a random variable 
with distribution 
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PY (A) = PY ({x|X(t,x) €A}). (15.8) 


The position at time f is random because the initial position of the particle is random! 
We will use this again for the third operator, where we have the same situation, but 
with a nontrivial function F’. 


15.1.2 Asymptotic Velocity and the Momentum Operator 


We come back now to Sect. 9.4 and ask for the asymptotic velocity of a freely 
moving particle, i.e., X(t,x)/t. To simplify the notation, we once again set i/m = | 
and thus look at the Schrodinger equation 


7] 1 
15, 0(%:t)=—ZA@(xt), — G(X,0)=Go(x), xR’. (15.9) 
We need an asymptotic expression for @(x,t) at large times, which we will justify 
rigorously in Remark 15.8. This is not a big issue, but the mathematics fits bet- 
ter there than here. The stationary phase argument from Sect. 9.4 suffices for the 
moment, where we found for the asymptotics of the solution of (15.9) that 


(x,t) ~ = (=). fort e. (15.10) 


The Bohmian trajectories following the asymptotic wave function are straight lines. 
We pick up (9.26) once more, and recall how we get the following nice and relevant 
example from (15.10), bringing us back to (15.1) and the PVMs. 

Consider the function F;(x) = x/t. Then, with (15.8), we find 


t—0o t—c0 


= sine ({xl™? ea} ) 


4 2 x) 43 
= fim | \o(x,1)/?xa (*) x 


x 
lim P® (F-!(A)) = lim P® (+ cA) 


= lim + | (=) Ea (*) Bx [by (15.10)] 


t—00 
= tim f \Bo(k)|?za(k) ak 
= (9|Oa®) , (15.11) 


where we substituted k = x/t. This is now a very interesting result. If we interpret 
the random variable 
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as an approximation to the velocity of the Bohmian particle, then we have shown 
that V;(x) converges in distribution to 


V..(X) = lim Vi(x) , 


where the random variable V... has distribution with density |@(k)|?. And this brings 
us to the momentum operator of quantum mechanics. 

We now have the random variable V.., the asymptotic velocity. In order to con- 
tinue as in (15.4), we need to express the probability 


P?(V.. € A) = (@|On@) 


in terms of a POVM, say (V4) Acz(r3)- With the Fourier transformation F [see 
(13.15)], which is unitary on L”, we find that 


(Q| Oa ) = (9 | F*OaF &) =: (| Va) . (15.12) 
Thus V4 = Vx (since O, = O%) and 
Vi = F*OnF FOF = F*OnFF 'OVF = FOF = F* OF =Va, 


whence (V4) 4caip3) is a PVM. 

The asymptotic velocity is experimentally an easily accessible quantity, and it is 
therefore convenient to introduce the corresponding self-adjoint velocity operator 
Vo (or the momentum operator p = mv..). It is constructed from (V4)4cg analo- 
gously to (15.5) [see (15.3)] through 


V2 = [KOM (15.13) 
Again we delay the discussion about its domain. 


Analogously to (15.4), the moments E?(V2.) of the asymptotic velocity are given 
by 


° (V2) = (@|Va9) , (15.14) 
where 
= / k" dV . 
Hence with (15.12), we have 
V9(A) = (@| Vag) = (@| Oa) = OF (A), 


and thus 
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s°(vs) = f"ave(k) = f k"d0%() = [ k"|G(K)PaPE. 


What about the domains now? We emphasized in the previous chapter that an opera- 
tor is only well defined if its domain is specified, which is a tricky matter in general. 
However, for X and ¥..., this is simple. Each component x; (i = 1,2,3) is a self-adjoint 
operator on Y(x;), and the action of x; on yw € Y(x;) is just multiplication: 


Xi W — xix) - 


It is straightforward to see that real-valued multiplication operators are self-adjoint 
on their maximal domain (if the latter is dense). Hence, 


AR) ={vEP| vll<-}, 9®={weL| || xlyl]| <>}, 
are the proper domains for the position operator. To be precise, in (15.4), we thus 


have to pick @ in the domain. 
For the asymptotic velocity operator, we find with (13.15) that 


(olvy) = [ 6 UK VK) Pk = (|(—IVy)) = (ol(-V)y) . 
Hence, for 
VEAGn) ={GEL| Voll] <=}, 
we have 
Ba 1 
Veo Wo 7VvV. (15.15) 
Thus X and V.. are unbounded self-adjoint operators. 

We should note that there is a noteworthy correspondence between the distribu- 
tions |@|? and |@|?. The variances are in some sense inverse. For Gaussians, this is 
easy to see. The Fourier transform of a Gaussian is again a Gaussian, with width 
inversely proportional to the width of the original. In general, one can only say that 


the product of the variances is greater than or equal to 1/4. Here is an elegant proof. 
For the variances Ax and Ay.., and for @ € A(x) A(V..), one has 


(Ax) = (p19) — (ple)? =: ( (®— (R))”) = (#) , 


(Av..)? = (p[¥29) — (pI¥=@)” =: ( (Fe — We))”) =: (W) , 
where xy is the matrix (x;y;);,;, and thus 


Ax= (#2)? Ay = (v2)? | 
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Cauchy—Schwarz and self-adjointness imply 
(|e) < (2)? (G2)? eat) < (2)? RY? 


Now (15.15) suggests looking at 


[{R¥. — Fak) | = |([K Fel)| = |[(R VI) S20) HEY 
since, with the unit matrix E3, we have 
Paes 1 1 1 
Rule =x (ZV) (x) ~ FV (x0(x)) = ~F 91x) (15.16) 


For 9 € F(V.0) M A(X), we thus have 


Adve. = (x2)'? (92)? > SEs. 


(15.17) 


Recall that we put fi/m = 1, since we are focused on mathematics in this chapter. 
Reinstating fh and m, we find that 


Vo=—-V, 
mi 
and with the momentum operator 
- h x 
MVo = ~-V=P, (15.18) 
i 


our result (15.17) is just Heisenberg’s uncertainty principle for position and momen- 
tum. The relation (15.16) is Heisenberg’s famous commutation relation 


[x,p] = inE; . (15.19) 


Now this is mathematically interesting, but not especially exciting, since the phys- 
ical mechanism behind it is absolutely trivial: the more the support of the initial 
wave function @p is localized, the faster it spreads under the time evolution. Hence, 
the better one knows the initial position of the particle (i.e., the smaller the width of 
|@o|*), the broader will be the distribution of the asymptotic position of the freely 
moving particle. 

Although that is really all there is to say, we remark as an aside that we can 
differentiate the Heisenberg position operator (15.7) with respect to time to find, 
using U(t) = exp(—itH /h), that 


With (15.18), we can write 
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en |) 
H,x| = — 
if 13] m’ 
and therefore 
d. p(t) 
—X(f) = —— . 
ar *! ) m 


Hence, with (15.19), we get for small t that 


dt 


Fe, 2(0)] = fae + St m 


t=0 

This is an uncertainty relation between the initial position and the position after 
some short time, and once again it just expresses the fact that the wave function 
spreads. If the wave function can evolve freely, it separates into a superposition of 
wave packets with locally well defined velocity, and we find the situation considered 


before. 
Differentiating once more yields 


Blt) = WRI), 


which is another form of Ehrenfest’s theorem, called Heisenberg’s equation of mo- 
tion. The simplicity and conciseness of this operator equation are impressive, and 
might prompt one to rethink the role of these operators. Maybe they are indeed more 
fundamental than we have been ready to admit? Why do these equations come out 
so nicely? Maybe the particle does have a momentum after all? Now the answer is 
simple. In Bohmian mechanics, the particle has no momentum, while in Newtonian 
or Hamiltonian mechanics it has momentum. That is all there is to it. 


Remark 15.4. The Position Operator for the Harmonic Oscillator 

Heisenberg’s equation of motion for the harmonic oscillator is easily solved, and 
one finds that the position operator is periodic with period T, i.e., x(T) = x(0). 
The statistics of the particle positions at time T and time 0 are identical. Therefore, 
according to Remark 12.2, we can “measure the position operator at time 0” by 
measuring the position of the particle at time 7. However, for this to be true, the 
position of the particle at time T need not agree with the position at time 0, since 
only the distributions actually need to agree. One can easily see from the example 
in (8.20) that, in general, the trajectory is not periodic in position space. 

In conclusion, we have found that a “measurement of the position operator” (a 
nonsensical statement anyway) is not always a measurement of the position. This is 
as unimportant as anything could be, but nevertheless often leads to confusion, and 
we should therefore take note of that. a 
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15.2 The Spectral Theorem 


15.2.1 The Dirac Formalism 


We now return to the statistical calculus based on self-adjoint operators. Recall the 
book-keeping operator (12.14), viz., 


A= ¥ Aah (15.20) 


which contains the statistics of a measurement process with possible outcomes Ag, € 
A. The orthogonal projectors Po, project onto the orthogonal subspaces .% of initial 
states leading to the outcome Aq. If the possible results A, are real, then A is self- 
adjoint. Why should measurement results be real? Because, they typically count the 
number of some units, and we use real numbers to count. 

If we compare (15.20) with (15.5) or (15.13), and write P,, for Py, we see that 
the structure is the same. Moreover, the projectors P,,, form a PVM (at least on the 
range of >) P,,,), but this time we obtain a discrete measure supported on the points 


Ra: 


P?(A4)= > (91P,9)- 
{a|AgeA} 


In the Dirac notation, one writes one-dimensional orthogonal projectors P,,, as 
Phe, = |Pra)(Prce| » 

where the @,,, € RanP,, satisfy 
(PrglPrg) = 9a, - 

If (,,,) forms a basis, then 


Yl Pre) Pag] = idve - 
ho 


One immediately sees that Ao, are the eigenvalues of A, with #%, the correspond- 
ing eigenspaces, spanned by the eigenvectors @,,. The advantage of this spectral 
representation is that one can take functions of the operator: 


f(A) =D fAa)Pr, - (15.21) 


This is one reason for diagonalizing matrices.” Does such a representation also exist 
for X or V..? In fact, we can use the Dirac notation as a powerful formalism for doing 
just that [see (9.27)]. Write 


? This is just another way of writing the diagonal form of a self-adjoint matrix. 
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d? Ox = |x) (x|d?x , (15.22) 


(x\x') = 3(x—x), 


3 3 
Po Ox Op;3 172(R3) I. |x) (x|d x, 
and 


dV, = |k) (k|d3k , 


(k[k’) = 5(k-K’), 


i. PVk = Vas = I12(@3) = I. k) (k|d?k , 
where, for y,@ € L7(IR?,d°x), we put 
(Ix) (xl) = 9°(x) w(x) 
and 


(ok) (k|w) = @*(k) y(k) . (15.23) 


One can now easily guess what (x|k) should look like. From here on, everybody 
can find his or her own way of thinking about (15.22) to (15.23). For example, we 
can pretend that |x) and |k) are elements of L7(IR*) in order to associate the usual 
geometric pictures with the formulas. We then write 


x= [xouies, (15.24) 
instead of (15.5), and 
Vn = Jk) (klk, (15.25) 


instead of (15.13). For example, by showing the version of (15.6) and (15.14) in the 
Dirac notation, i.e., 


z" = [em (xix, = Jes (klk, 
we see how nicely and intuitively we can work with this formalism. 


We now have (15.24) and (15.25), and it very much resembles (15.20). Formally, 
we can even call x (and analogously k) the eigenvalues of the position operator X 
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corresponding to the eigenvectors |x), although we must not of course take this too 
seriously. Clearly, the “eigenfunction” |x) is not an element of L”, and for this reason 
one often calls it a generalized eigenfunction. 


Remark 15.5. Almost Eigenvalues and Approximate Eigenfunctions 

Now any A € R is almost an eigenvalue of the (one-dimensional) position operator 
in the following sense.* For € > 0, consider y (x) = Come eee (x). Then 
Il w,|| = 1 and 


> e||2 2 aE () |2 as 2 aE (-) |2 2 aye 
| A)wg |? = [be APlvEG)Pax= fe APlys@)P ar < e*llyel 
shows that yy is a sequence of normalized vectors such that 


lim (fA) =0. 


e—0 


One thus has a unified language without a uniform meaning. This carries with it a 
danger of confusion, and for the mathematical formulation we therefore introduce 
new names: spectrum, spectral family (=PVM), and spectral measure. The spectrum 
of the operator A is the set of its eigenvectors {A,,}, and the spectrum of the opera- 
tors ¥ and ¥.. is just R3. The PVM of A is (Py = Digcd Pig )AcZ(R)> and the PVM 
of X OF Veo is (O4) 4c.acp3) OF (Va) ae.acr3y, Tespectively. 

The corresponding spectral measures are (w|Pay), (w|Oay), and (w|Vay), 
where (W|PaW) = Dj,,ca(W|Py, YW) is a discrete measure. Moreover, note that X is 
diagonal, in the sense that X acts as a multiplication operator on L?(R*,d°x) (“po- 
sition representation’). In the same way V.. is diagonal after Fourier transformation 
(“momentum representation’). More precisely [see (15.12)], 


lo. a A 

lax ve L corresponds to X= .FV..¥* on L? . 

i ox 
It is this simple action as a multiplication operator in the corresponding “represen- 
tation” which lies at the heart of Dirac’s formalism. 


15.2.2 Mathematics of the Spectral Theorem 


Now recall the Schrédinger equation and the Schrédinger operator, which must also 
be self-adjoint, and recall the lesson from linear algebra that it is very fruitful to 
look at the world, i.e., the vector space, from the point of view of the operator under 
consideration. This means going to the eigenbasis of the operator, i.e., its spectral 


3 Analogous statements hold for operators with a continuous spectrum, something to which we 
shall come soon. 
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representation. So can we do this for the Schrédinger operator H? Is there also a 
spectrum, a PVM, and the corresponding spectral measure for H? The answer is 
that there is indeed a corresponding calculus for general self-adjoint operators T, 
called a T-representation. This is the content of the spectral theorem. It comes in 
several different flavors which are, in a sense, all equivalent: 


e The functional calculus form of the spectral theorem says that one can take func- 
tions f(T) of self-adjoint operators T in a consistent way. 

e The multiplication operator form says that any self-adjoint operator T becomes 
a multiplication operator in a suitable representation. 

e The PVM form says that any self-adjoint operator T has a unique associated 
PVM P, such that 


T= [ rap, 


We have already seen how all these various aspects can be very useful when working 
with concrete operators like X or V... So in this section, we will discuss all three 
aspects in some detail. We start with some basic definitions. One central object is 
the spectrum of an operator, which is the generalization of the set of eigenvectors of 
a matrix. Since in general we no longer have eigenvalues, we need to generalize the 
eigenvalue equation in a clever way. 


Definition 15.3. Let T be a closed operator with dense domain Y(T) on a Hilbert 
space .#. The resolvent set p(T) is the set of all A € C such that 


(A-T)!: 30 = WT) _# 


exists as a bounded operator in 4 (#7). R(T) = (A —T)~! is called the resolvent 
of T at A. The complement o(T) of p(T) in C is the spectrum of T. 


Remark 15.6. On the Resolvent and the Spectrum 


(i) Let T be bounded. For A > ||7'||, we can express the resolvent as a convergent 
Neumann series, 


1 i 4 = . 
R,(T) = 7 = Zinta -+|+3G)). (15.26) 


n=1 


Note that with |/7”|| < ||7||”, which is immediate from the definition 


|T|| = sup ||Ty||, 
\|yl|=1 
we have 
7". |7|" 
Rx (T)|| < ae py Wie <a . aie (15.27) 


n=0 
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Hence the series converges in the operator norm and one can see, as in 


N 
(l-x) ¥xt=1-x4t! G41, Now, 
n=0 


that the limit is indeed the resolvent. 
(ii) | The resolvent set p(T) is open and as a consequence the spectrum o(T) is 
closed. This can be seen as follows. Let Ay € p(T). The series expansion 


I 1 = il 1 
A-t A-Apt+Ao-t Ao t,_Ao-A 
Ao —t 
— 1 Ss fao-A ‘J |Ag — A} 
re? rer ar era 
suggests defining the operator 
Ry(T) = Ra (T) D Ao—A)"Rag(T)" (15.28) 
n=0 


which is well defined for |A — Ao| < ||Ra,(Z)||~!. One checks as in (i) that 
(A —T)R,(T) =|=R,(T)(A —T), and therefore R(T) = Ry (T). Hence 
A € p(T) and p(T) is open. At the same time we have shown that the map 


p(T) > 2(#), AOR MT), 


is analytic, i.e., that it can be expressed locally as the convergent power series 
(15.28) with coefficients in 2(H). 

(iii) If A € C is an eigenvalue of T, i.e., TQ, =A, holds for some g, € # 
with @, 4 0, then A — T is not invertible, and thus A is in the spectrum o(T). 
The set of eigenvalues is called the point spectrum o)(7) of T, and we have 
0)(T) C o(T). | 

Theorem 15.1. Let T be self-adjoint. Then 

(i) o(T) CRand ||(T—z)~"|| < |S(2)|"'. 

(ii) SUP, eg(r) |A| = ITI - 

(iii) Fora,a' €0,(T) anda #X', one has (4|4") = 0. 


Proof. (i) Let A = x+iy with x,y € R and y 0. Then like T, the operator y~!(x—T) 
is also self-adjoint. According to Theorem 14.3, S = y~!(x—T) +i is a bijection and 
thus has a bounded inverse by the open mapping theorem (see, e.g., [1]). But then 
yS = A —T also has a bounded inverse and A € p(T) follows. The bound on the 
norm of the resolvent follows from 


||(7 -—x—-iy)y |? = ||(7-x) yl)? + yl? > y?Ilyll?, for all ye D(T), 


for z=x+Hy. 
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(ii) We showed already in the previous remark that A := SUPyco(r) || < |||. The 
other direction is intuitively clear, when thinking of diagonal matrices with the 
eigenvalues on the diagonal. To see it in general, we use analyticity of the resolvent. 
Since the Neumann series (15.26) is the Laurent series for Ry (7) with expansion 
point +c», it converges for all A > 2. The radius of convergence is limited by the 


singularity at A. Hence 
co T n 
n=0 G ) 


converges absolutely for A > 7. However, for self-adjoint operators, one has 
IT" = ITI", (15.29) 


and therefore 


Ir" 


|" || 
> |A|" => jar ? 


which implies that A > ||7'|| and thus A > ||T'||. We still need to show (15.29), which 
will follow immediately once we have the spectral theorem. For the argument above, 
it suffices to show ||77"|| = ||7'||?”, which can be seen as follows. For any bounded 
operator T, one has ||77|| < ||7'||?, and also for self-adjoint T, 


IT? = sup ||Ty|? = sup (Ty,Tw) = sup (y,T?y) <||T" |), 
vi lvi=1 Ivi= 


where we used Cauchy—Schwarz in the last step. 
(iii) is shown as in linear algebra. | 


For later use we note that, for self-adjoint T, we have just seen that 


|77|| = sup |(Tw|Ty)|, 
lvi=1 


which suggests that 


Iri=( sup / VTvlvTw)|) = sup |(w|Ty)|, (15.30) 


lwll= Il yl|=1 


where the bracketed part in the middle is completely formal at the moment. But the 
equality does indeed hold. Reality of the norm implies 


IT]= sup 8( TE Iry) < sup CITY) 
iva \ITI lioll=1 y= 
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and with polarization, we get, for self-adjoint T and ||@|| = ||y|| = 1, 
1 
RigiTy) = z|(+ vIT(@+W))—(9-v|T(9-w))| 


1 
< q(llo+ vil? +\le— wil’) sone Cal) 
n\|= 


= sup (n|Tn) <|T||. 
\In||=1 


Hence (15.30) follows. 


For a not too technical start, we first construct the PVM associated with a bounded 
self-adjoint operator 7. The case of unbounded operators will be treated later on. 
The idea of the construction is quite simple. It is clear that the PVM is supported 
on the spectrum o(T), and with (15.21) in mind, we should build the characteristic 
function 74(T) of a subset A € o(T), since x), (A) = Pra 

The characteristic functions of all Borel subsets of the spectrum yield the PVM of 
T. However, this involves slightly more work than one would initially expect. One 
might want to start with polynomials of T — no problem with that — and then ap- 
proximate characteristic functions in a suitable sense. But here comes the problem. 
What is a suitable sense? One probably has pointwise convergence of polynomials 
Pn in mind. To conclude the existence of a limiting operator 7(T), one would like 
to use completeness of the space “#(.#) of linear bounded operators on a Hilbert 
space .# with respect to the operator norm. This completeness does indeed hold 
true and is easily shown as an exercise. This works analogously to showing that the 
space of continuous functions on a compact set is complete under the sup-norm. 

So we know that #(.#) with the operator norm || - || is complete. Now we need 
to show that convergence of polynomials p, — f implies convergence of the cor- 
responding operators p,(T) — f(T). However, one quickly realizes that pointwise 
convergence is not the right notion to start with here. It is much easier to consider 
uniform convergence first, as the sup-norm goes well with the operator norm. Con- 
sider a polynomial p : o(T) —> C. Then 


|P(T) || = |Ipllo:= sup |p(A)|. (15.31) 
A€o(T) 


The proof is not completely trivial, since we allow p to have complex coefficients. 
We do not really need this now, but it will help us later on, when considering the 
notion of a cyclic vector. 

For real polynomials p(T) is self-adjoint, but for complex polynomials only 
p(T) p(T) is self-adjoint. But we planned ahead for this when showing (15.30): 
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|P(T)|? = sup (p(T)w|p(T)y) = sup (w|p*(T)p(T)y) 


Ilvi=1 lvii=1 

=|P(T)p)||= sup JA =_ sup _|p*(A)p(A)| 
Aco(p"(T)p(T)) AE aT) 

= |lpllz - 


However, we are not quite through yet, since we used the fact that, for polynomials 
p, one has 


o(p(T)) = p(o(T)) = {u|M= P(A), AE O(T)}, (15.32) 


which then implies the equality 


sup |A|= sup |p(A)|. 
A€0(p(T)) AEo(T) 


To see (15.32), let A € o(T). Note that p(x) — p(A) = (x—A)q(x) with a polynomial 
factor q(x), since A is a zero of the left-hand side. This implies 


P(T) — p(A) =(T—A) q(T) . 


Since (T — A) is not invertible for A € o(T), p(T) — p(A) is not invertible for 
A € o(T). We thus have p(A) € o(p(T)), and therefore p(o(T)) C o(p(T)). 
Conversely, let UW € o( p(T)). According to the fundamental theorem of algebra, 
p(x) — p factorizes and therefore 
N 
p(T)-u=[](T-Ai), 


i=l 


with the zeros A;,...,Ay. Since p(T) — p is not invertible, there is at least one A; 
such that T — A; is not invertible, and with p(A;) = LL we conclude that o(p(T)) C 


p(o(T)). 


We can now make precise the idea of taking functions of operators on #(.#). The 
following theorem usually comes under the heading of a functional calculus for 
self-adjoint operators. 


Theorem 15.2. Let T € (HH) be self-adjoint. There exists a unique mapping (the 
functional calculus) 


® :C(o(T)) — £(#), 
such that 


p= fr)” 
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where the latter is defined by the following requirements. For all f,g © C(o(T)) 
and a,A€C: 


(i) P(1)=lyx, 

(ii) f(x)=x Oe a ae 

(iii) is linear i.e, B(f+g) = O(f)+ P(g) and (af) =aP(f), 
(iv) is multiplicative, i.e, B( fg) = B(f) P(g), 

(vy) @(f")=O(f)*, 

(vi) o(®(f)) = f(o(T)) = {ue C|u= f(A), AE O(T)}, 

(vii) ||®(f)\|=Ilf lle. 

(viii) Ty=Ay => O(f)y=f(A)y. 


Conditions (i-iv) imply that, for polynomials p, we must define ®(p) := p(T). 
Since the remaining statements are evident or have been shown before for polyno- 
mials p(T), we have a unique functional calculus for polynomials. The lift to con- 
tinuous functions goes by density. Since o(T) is compact, polynomials are dense in 
Cc (o(T)) for the sup-norm. With (vii), this translates nicely to the level of operators. 


Lemma 15.1. There is a unique extension of ® from the linear space of polynomials 
to its closure C(o(T)) satisfying (i-viii). 


Sketch of the Proof. Since ® is a bounded linear map from a dense subspace (the 
polynomials) of the normed space (C(o(T)),||-||-.) into the complete normed space 
(2(H),||-||), it has a unique bounded extension to all of C(o(T)).* We just need 
to show that the properties (i—viii) in Theorem 15.2 survive the limit. 

For (iv), approximate f and g by polynomials, p, — f and g, — g. Then pygn > 
fg and a simple triangulation using uniform boundedness of ®(p,) yields the result. 
Straightforward approximation also yields (v), (vii), and (viii). For (vi), first let u ¢ 
Ran(f). Then g = (f —)~! exists as a bounded function and ®(g)®(f — u) = 
(1) = | implies that 

®(g) = (@(f—w)) ' = (®(f)- Hn) 
is bounded. Thus up ¢ o(®(f)). Conversely, let  € Ran(f), ie, u = f(A), for 
some A € o(T). How can we show that u € o(®(f))? We already know that, 
for any polynomial p, in an approximating sequence p, — f, we do indeed have 
PnlA) € o(D( Dad) Hence for any € > 0 there exists an almost eigenvector y;, with 
|| Wn] = 1 such that 


—1 


| [P(pn) — pn(A)| Wn|| <E. 


4 Pick a Cauchy sequence p, of polynomials that converges uniformly to f € C (o(T)). Then by 


|| ©(P2) — P(Pm)|| = || P(Pn = Pm) || = ||P — Pelee » 


we also know that ®(p,,) is a Cauchy sequence in the complete space 2(./%). Its limit defines the 
unique extension. 
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Choosing n large enough to ensure that || P(pn) —O(f) | < € and |p,(A) —u| <e, 
it follows that 


| [®(f) —u] vn 


= lO) — 80x) + ©(pn) = pal) + pal) — HY 


+|Pa(A)— 4 


< Cf) — (Pn) | + || [(en) = Pa)] Vr 


<3e. 


Since € > 0 was arbitrary, (®(f) — 11) ~| cannot be bounded. 


Now we can use continuous functions to approximate measurable functions, which 
is where we want to get to in the end. While this approximation can no longer 
be done with respect to the sup-norm, we recall that, under an integral, pointwise 
convergence is usually enough. Clearly, the mapping 


1:C(o(T)) —C, 
fr Uf) = (WIFT)Y) | 


is linear and bounded. Moreover, it is positive in the sense that it takes non-negative 
values /(/') on non-negative functions f, since for f > 0 and g := \/f, we have 


f= (wl fD)w) = (wl e(Te(T)w) = (e(T)y|e(T)w) = |le(T) wll? > 0. 


Therefore, we can quote a classical representation theorem — the Riesz—Markov 
theorem — which tells us that there is a unique measure ie on o(T), associated 
with this linear functional, such that 


= (y| f(T) v= [.,, f(A)du¥(A), forall feC(o(T)). (15.33) 


Indeed, the theorem states that fred is a regular Borel measure.> We do not give the 
proof here, which is somewhat laborious, but not very demanding.® 


> A Borel measure 1 is a measure defined on the Borel o-algebra 4(Q) of a topological space 
Q. It is regular if compact sets have finite measure and if it is compatible with approximation by 
compact sets from inside and open sets from outside: 


(A) = sup {u(C) |C CA, C compact} = inf {1(0) |A C O,A open} , 


for all A € &. 


6 The idea is simple. One can use continuous functions to approximate characteristic functions of 
Borel subsets of the compact set o(7)) in order to define an outer measure 


u*(C) =inf {I(f)| f €C(o(T)), f2 xc}. 
for closed C C o(T) and 


u*(A) = sup {u*(C) |C CA, C closed} , 
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Definition 15.4. The measure p17 on o(T) defined in (15.33) is called the spectral 
measure of T for the vector y. 


The rest is straightforward. By polarization, we define the complex measures 


1 ee ae 
pe = 5 (uftY — uh Y tine — ink) , 


where the map (@, YW) > jie Y is sesquilinear by construction. Hence, for bounded 
Borel-measurable functions f, we can turn (15.33) into a definition: 


(olrryw)s= f FA)aupM (A). 


Again this uniquely defines the operator f(T) € &(H) according to Theorem 13.4. 
Since we can now work with the integral representation of ®(f) = f(T), pointwise 
approximation of measurable functions by continuous functions’ suffices to lift the 
properties (i—viii) of ® to .W(o(T)), the space of bounded Borel functions on the 
spectrum. This functional calculus on .“@ (o(T)) is unique if we add the following 
property to Theorem 15.2: 


(ix) If a uniformly bounded sequence (f;,) in .@(o(T)) converges pointwise to f, 
then 


s-lim f,(T) = f(T). 


n-o 


In conclusion, we now have Theorem 15.2 for functions in .@(o(T)) with the ad- 
ditional property (ix). This is all we need! It follows directly from (i-ix) that the 
family of characteristic functions (¥4(T))aca(o(ry) defines a PVM. Each operator 
Xa(T) is an orthogonal projector because of (iv) and (v), ¥o(r)(T) = lw is just (i), 
and sigma additivity follows from (iii) and (ix). Furthermore, by construction, the 
spectral measure is given in terms of the PVM through 


ta) = [zal aut (2) = (w|aalTY¥). 
Thus we have 
f= f FA)ana(T) (15.34) 


This allows us to formulate the spectral theorem for bounded self-adjoint operators. 


for arbitrary subsets A C o(T). Then Carathéodory’s construction yields the desired Borel measure 
y 

Hr - 

7 Lebesgue’s theorem of dominated convergence is used here once again. 
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Theorem 15.3. There is a one-to-one correspondence between bounded self-adjoint 
operators and compactly supported PVMs on R given by 


T — (A(T) seaocry) > With (15.34), 


and 
(Pa)aca(a) —? T= [aar, 


But is it really clear that a given PVM on a compact subset of R defines a self- 
adjoint operator as claimed in the theorem? The answer is affirmative here because, 
by Remark 15.1, we have 


(vIT@) = f AaPre(a) = | 2aP°¥(R) = f Aape¥( a) =(olTW 


= (Ty|9). 


15.2.3 Spectral Representations 


In this short section, we make some remarks concerning spectral measures and “di- 
agonalization” of self-adjoint operators. The idea is to write a self-adjoint operator 
T as a multiplication operator on L? (o(T) , du) with a suitable measure tt. The po- 
sition operator, for example, is already given in this form. This is another variant of 
the spectral theorem and essentially an elaboration of (15.33). 

Let T € £(#) be self-adjoint. We are looking for a measure on o(7) and 
a unitary map U : # — L?(o(T),du), such that UTU~! on L?(o(T), du) is di- 
agonal, i.e., multiplication by the function h(A) = 1. To get the analogy with the 
diagonal representation of a self-adjoint n x n matrix A, one should think of a vector 
v € C” as a function on the n distinct eigenvalues of A, i.e., on o(A) = {A,... ,An}8 
Now multiplication by the diagonal matrix diag(A1,... ,A,,) translates into multipli- 
cation by the function A on L?(o(T),d"_, 6(A —Ai)). 

How can we get an analogous result for operators? One idea would be to look, not 
only for eigenvectors, but also for “generalized eigenvectors’, that is, for “all” solu- 
tions of T@ = 19 (@ not necessarily in .#). For example, the operator of asymptotic 
velocity can be diagonalized by Fourier transformation, which can be seen as repre- 
sentation with respect to the “generalized eigenfunctions” e!“*. However, it clearly 
makes no sense, for abstract Hilbert spaces #, to ask for solutions of To = Aq out- 
side of .#. So we present a slightly different point of view which works in general, 
and come back to generalized eigenfunctions later on. 


Definition 15.5. Let T be a bounded self-adjoint operator on #. A vector n € #7 
is called a cyclic vector for T if span(T”1)_, is dense in. #. 


8 The case of degenerate eigenvalues requires one more step, as explained below. 
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We will give an example that shows that there need not be a cyclic vector, but that 
will present no problem. One can always split # into a direct sum of orthogonal 
subspaces for which cyclic vectors exist. But now assume that a cyclic vector 7 
exists. Then, for y,@ € #, there are bounded Borel functions g and f such that 
w = 9(T)n and 9 = f(T)n. It follows that 


(w|T@) = (9(T)n|TF(T)n) = (n | g(7)*TF(T)n) 
= fe 8AAMA)AM|za(T)M) [by (15.34) 
= [lay fa)AduI (A), (15.35) 
o(T) 
where du! is the spectral measure corresponding to the cyclic vector 7. Now we 
simply define 
U:H — L*(o(T), duz(a)) 
Wr &, 


and observe that U is unitary: 


(UW|U)72(6(7),au(a)) = [era duz (A) =(n le"(T) F(T) 


= (g(T)n |FULIN) oe 
=(W|9)x.- (15.36) 


As (15.35) shows, T acts as multiplication by A on L?(o(T),du7) : 
(UTU~'g)(A) =Ag(A). 


However, the spectral measure du! is not unique, since there will be many cyclic 
vectors. If 1 is cyclic and f € .@(o(T)) is strictly positive, ie., f(A) >c > 0, 
then f(7) is invertible and commutes with T, whence f(7)nN is also cyclic. It is 
not hard to see that multiplication by A on L*(o(T),du) and multiplication by A 
on L? (o(T) ; dv) are unitarily equivalent if and only if the measures are equivalent, 
i.e., if and only if they have the same sets of measure zero. 


Let us note finally that the non-existence of cyclic vectors in general, and the result- 
ing need to split the Hilbert space into a direct sum of cyclic subspaces, is related 
to the existence of degenerate eigenvalues. Let us give a simple example. Consider 
HC —® andT = Pe, , the orthogonal projection onto the first canonical basis vector 
e,. Then 


span((Pe, VY) 25 = span(v, Pe, V) = span(v, e1) 
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is at most a two-dimensional plane, spanned by v and e;. Hence there is no cyclic 
vector for Pe,. This is due to the twofold degenerate eigenvalue 0, so that we need 
two vectors v and v/\e; as “cyclic” vectors. Then 
3 co co 
R° = span (Pe\v) 9 @ span (Pe, (v/e1)) 4 = span(v,e1) @ span(vAe}) . 
If T on R? has three different eigenvalues with eigenvectors v;, then each v = >) OjVv; 
with a 4 0 is cyclic. 


15.2.4 Unbounded Operators 


Life would be easier if we could avoid this topic altogether. But the Schrédinger 
operator is unbounded, so we should say briefly how the spectral theorem can also 
be obtained for unbounded self-adjoint operators. The reader should bear in mind 
that we made heavy use above of the fact that o(7) is a compact set, which is true 
only for bounded operators. There are several different ways to derive the functional 
calculus, the spectral representation, and the spectral measures for unbounded self- 
adjoint operators. Instead of describing one method in detail, it is more instructive 
to sketch the different approaches. 

One way to proceed is to make use of the spectral theorem for bounded self- 
adjoint operators. More precisely, one could look for a bounded self-adjoint oper- 
ator in the form of a bounded function of T and use its spectral representation to 
get the same for 7. However, this idea is somewhat circular, because only the spec- 
tral theorem for unbounded operators would allow us to talk about functions of T. 
However, there is one bounded function of T that gets defined independently of the 
functional calculus, namely the resolvent. Recall that, according to Theorem 15.1 
for self-adjoint T, we know that o(T) C R and therefore (T +i)~! are bounded 
operators. On the other hand, they are not self-adjoint, since with 


(T-i) "w|@) = (Ti) *y| (7 +i)(T +3) ‘@) 
= ((T+i)*(T-i) 'y| (T +i) ') = (w|(T+i)'@), 


we have |(T — iy = (T +i)~!. However, not only self-adjoint matrices, but 
also normal ones are diagonalizable. We can prove the spectral theorem for normal 
bounded operators? in the same way as we did for self-adjoint bounded operators. 
Since (T +i) and (T —i) commute, (J +i)~! and (T —i)~! also commute. Thus 
(T —i)~! is normal and one can represent it as a multiplication operator on a suit- 


° Each bounded operator can be written as the sum of two self-adjoint operators: 


R ee ge 
= 1 : 
2 21 


They can be diagonalized simultaneously if and only if they commute, which is the case if and 
only if R and R* commute. Such operators are said to be normal. 
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able L?-space. We skip the construction of spectral measures for normal bounded 
operators and also the construction of the spectral measures of T from the one for 
the resolvent, but instead follow an alternative route to the spectral theorem for un- 
bounded operators. 

Recall that the first step in the case of bounded self-adjoint operators was the 
construction of a functional calculus in Theorem 15.2. There the starting point was 
to consider polynomials. However, for an unbounded operator T, any polynomial 
p(T) is itself an unbounded operator, and approximation by polynomials is not an 
option. The next idea would be to take polynomials of the resolvent R, = (T —z)~!, 
for z in the resolvent set of T. Such polynomials are dense in C..(IR), and one could 
indeed use them in order to set up a functional calculus for unbounded self-adjoint 
operators, if one could show the analogue of (vii) in Theorem 15.2. In the following 
we will pursue the idea of writing f(7) in terms of the resolvent (T —z)~!, but 
in a more explicit way than just saying “by density”. It turns out that the resulting 
explicit formula is very useful in applications. 

To motivate the formula, consider first a compactly supported function f : R? > 
C which is continuously differentiable. In the following, we will identify the do- 
main R? with C via z = x +iy. However, f is not assumed to be holomorphic (it 
cannot be, since it is compactly supported). The reader may know that the Cauchy 
integral formula of complex analysis is just an application of Stokes’ theorem. We 
will now repeat the corresponding derivation, however, for our function f, which 
is not holomorphic. Let zp be a point in the support of f, I” a compact subset of C 
with smooth boundary oI such that supp/ is contained in the interior '° of I, and 
Bs(zo) the ball of radius 6 around zp. Interpreting 


w = f(z)(zo —z) | dz 


as a complex 1-form on the complement of Bs(zo), one finds that 


da = 2 [f(z)(zo -z)~'] dende+ © [f(z) (zo —z)'| dz Adz 
a F (e)(eo— 2 ede, 


where dz dz and 0:(z9 — z)~! both vanish. Note that dz dz = 2idxdy, and take 
6 > 0 small enough to ensure that Bs (zo) C ’°. Then Stokes’ theorem implies 


| o= [ o- f o= do. (15.37) 
O(I\B(zo)) or 9B5(z0) T\Bg(zo) 


By continuity of 0: f(z) and integrability of (zo — z)~!, the right-hand side of (15.37) 
is given in the limit 6 — 0 by 
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lim dw = 2ilim af i) (zo — z)bdxdy 
5-0 JT\Bs5 (zo) 50/T\Bs(zo) 92 
9} fe )(zo —z)~dxdy . 


For the left-hand side of (15.37), we get, by continuity of f, 


lim @ = lim z)(zy —z)'dz = —2ni f(z ; 
or OO ad ee )(zo — 2) Ff (Z0) 


and since f|gr = 0, 


f= ff z- 2)" "dz =0. 


Putting everything together, we have thus shown that 
== (z)(zo —z)'dedy. (15.38) 


Note that, for holomorphic f, one has d@ = 0, but then the integral over 0" con- 
tributes. In that case (15.37) yields the Cauchy integral formula. 

Back to operators. The idea is now to replace the variable zg in (15.38) by a 
possibly unbounded self-adjoint operator T, and to define 


sry=* f Ler zr taray. 


There are two problems with this formula. One is that, while the integrand is singular 
at a point in (15.38) and thus integrable, (T — z)~! is singular on the spectrum of 
T, which might contain intervals of the real line. On the other hand, we aim at a 
functional calculus for functions on o(7) C R, and the second problem is that it is 
not clear what 0; f(z) should mean for a function on R. 

The clever trick in the construction is now to solve both problems together. We 
extend f : R — C to a function f : C — C in such a way that 0:f(z) vanishes as 
|3(z)| when z approaches the real axis, and thus compensates the possible |3(z)|~! 
divergence of the resolvent [see Theorem 15.1 (i)]. Such extensions are said to be 
almost analytic, since 0: f(z) would vanish identically for an analytic extension. This 
also suggests a way to construct almost analytic extensions. 

Let f € C7*'(R). Then we can define f(x+ iy) by Taylor expansion at x, pre- 
tending that the Cauchy-Riemann equation 0, f(z) = id, f(z) holds: 


; n (x 
j= yoy 
j=0 


It is easy to check that 


Mathematical Physics 


15.2 The Spectral Theorem 327 


QO. 1 (n+1) 
£ 7) = 5 yy" ~ ptt = 18M", as ISI, 
Zz nN: 


and thus f is an almost analytic extension. Now f is neither compactly supported 
nor integrable, and therefore we need to multiply f by a smooth cutoff function y (y) 
such that 7|;_ 1,1; = 1. But then the integral 


g(r) =4 f Legr—2-aeay (15.39) 


converges in norm and defines a bounded operator f(T). 

It is not difficult to show that this f(T) is independent of n and the specific form 
of the cutoff, and that this formula defines a functional calculus for functions f € 
C5 (o(T)) satisfying || f(7')|| < || f||.o. One can then pass by density to the closure 


C(a(r))'” =c.(0(7)), 


the set of continuous functions vanishing at infinity. Using Riesz—Markov again, 
which yields finite measures 1; on A(o(T)), one finally extends the functional 
calculus to bounded Borel functions .# (oT), In this way one obtains the follow- 
ing result. 


Theorem 15.4. Let T with domain D(T) be a self-adjoint operator on F’. There 
exists a unique functional calculus 


®:.M(o(T)) > LL) 


with the following properties: 


(i) @ is linear and multiplicative. 

(ii) Of )=O(f)*. 

(iti) ||®(/)|| = le. 

(iv) Forz€C\Randr,(x) = (x—z)7!, one has ®(r,) = (T—z)7!. 

(v) Ifa uniformly bounded sequence (fn) in (0(T)) converges pointwise to 
J, then 


s-lim ®( fn) = P(f) « 
Note that (iv) replaces requirement (ii) in the bounded case and guarantees that the 
functional calculus is compatible with the definition of the resolvent. The proof of 
(iv) is actually the most difficult part, and for this one extends formula (15.39) to a 
class of functions containing r(x) (see [2]). 
Given the functional calculus, we can now define the PVM associated with a 
self-adjoint operator T as (xa(T)) AcA(o(T))? where for bounded functions f € 


M (o(T)) , we once again have, by construction, 
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fr)= [ #@ama(r). 
o(T) 


However, for unbounded operators T, the spectrum o(T) is not bounded, and thus 
the function f(A) = A is not bounded on o (7). Hence, 


T= Adu(r) 
o(T) 


is not an immediate consequence of the functional calculus. To understand this pre- 
cisely in the situation of unbounded operators, it is convenient to look first at the 
spectral representation. 

Assume for simplicity that T has a cyclic vector 77, i.e., every y € # can be 
written as g(7)n for some g € .@(o(T)). Then one can construct the unitary map- 


ping 
U: 2 — L?(o(T), duz(a)) , 
Wr g, 


as in the case of bounded operators. We find once again that bounded functions of T 
are mapped to multiplication by the corresponding function, i.e., for f € (o(T)), 


Uf(T)U-! = f(A). (15.40) 


The only additional point to show is that T is indeed mapped under U to multipli- 
cation by A. To see this, note first that (15.40) applied to the resolvent implies that 
the range of the resolvent is mapped to the range of r(A) = (A i), ie., 


UD(T) = {y E1?(o(T)) |AW(A) € L(o(r))} 
But then, for y € UD(T), we have y = rp for some g € L?(o(T)), and 
UTU 'w=UTU"'!rg =UT(T —i) 'U'9 = (ir+-1)9 =Argp=Ay. 


At this point, let us formulate our findings as a theorem. However, before we do so, 
we should get rid of the assumption that a cyclic vector exists. The following lemma 
is not hard to show. 


Lemma 15.2. Let T with domain D(T) be a self-adjoint operator on the separa- 
ble Hilbert space #. Then there exists a sequence of pairwise orthogonal cyclic 
subspaces Ly, C # with cyclic vectors!9 Nn such that 


KH = D., Ln ’ 


where N can be finite or c. Note that one only needs to take the closure for N = ©. 


10 Recall that 7) is a cyclic vector for L C # if L= span{ f(T)n | f €.M(o(T))}. Itis clear that 
a cyclic subspace is invariant for T, i.e., f(T)L = L for any f € .@(o(T)). 
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Using the cyclic vector 1,, we can map each cyclic subspace L, unitarily to 
i (o(T),du™). In summary, we then obtain the following multiplication opera- 
tor version of the spectral theorem. 


Theorem 15.5. Let T with domain D(T) be a self-adjoint operator on a separable 
Hilbert space #. Let (Ln, thy be a sequence of cyclic subspaces Ly with cyclic 
vectors Nn, as in Lemma 15.2. Then there is a unitary mapping 


U:H = Qr kn Ly —@p(otr )di) = (o(T) * {1,2-.4N}, di) 5 


n=l 
such that, for f € M(o(T)) and y € L?(o(T) x {1,...,N},du) =: L?(du), 
(Uf(T)U"w) (A,n) = f(A) WA,n) . 
Moreover, 
UD(T) = {we L’(du) |Aw(A,n) € L*(dy)} , 
and for all y € UD(T), 
(UTU'y)(A,n) =Ay(A,n). 


We can now use this spectral representation to formulate the precise connection be- 
tween unbounded self-adjoint operators and PVMs. To simplify the notation, we 
assume that there is a cyclic vector n for T or, equivalently, we reduce our con- 
siderations to one cyclic subspace. The key observation is that, with g = Uy and 
f =U@, we have, by construction, 


(w|xalT)9) = (n|e'(T)xa(T)F(T)n) = fe (AFA) AHF , 
and hence 


d(w| xa (T)9) = 8" (A) F(A) dup . (15.41) 


Thus for @ € D(T) and y € #, it follows that 
(wite)= | smararanta)=f rawlz(r)9). 
o(T) o(T) 


This yields the first part of the PVM version of the spectral theorem. 


Theorem 15.6. There is a one-to-one correspondence between self-adjoint opera- 
tors and PVMs on R: 


(i) LetT be self-adjoint. Then (xa(T)) Acw(o(r)) 84 PVM and, for f © M (o(T)), 
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=[  fa)an(r) 
o(T) 
For 9 € D(T) and y € #, one has 
(vito) =| Ad(wiza(T9). 
o(T) 
(ii) Given a PVM (Ps) aca); let 
= 2 a 
D(T):= {geve| [a d(p|Py@) < pe 
Then 
(vIT9) = [Ad(yiP.9), fory.eeD(T), 


defines a self-adjoint operator T with domain D(T). 
To understand the second part of the theorem, take a sequence of simple functions 
fnl(A) = Dey ay” Xa(n) (A) that converges monotonically to f(A) = |A|. Then, for 
j 
w,@ € D(T), we have 
[Palo+y|Pilo+y)) 
= lim f in(A)d(o+ vw] Pa(o +) 


= lim (a)? (p+w|P. 0) (p+y)) 

j=l 
< lim > (2) (ol Py) + (v1 Pua) +21(0|Pyo¥)| 
< [ Palolre)+ [ PaylP.y) 


+ fim 2)" Py 9)"*(wL Ayo)" < 


aoe : 


since y,@ € D(T), and 
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lim > (ay”) A) )*@ | Poo (eh ”) Pw Pym yy” 


nes . 


n 


, 1/2 1/2 
‘ n)\2 n)\2 
< lim Ea : (olPyo0)| » (a)”) (vleow) 
= 


n—-oo j=l 


= (J raloine)) (avin) <x, 


Hence, p+ y € D(T), and D(T) is indeed a subspace. It is also dense, since for any 
y € #, we clearly have P_,,,)y € D(T) and, by Definition 15.1(i), and (iii), 


lim P_ an V=V. 


n—-eoo 


To see that the integral [pA d(w|P,@) exists for y, p € D(T), note that the explicit 
decomposition of the complex measure p°(A) := (w| Pa@) is 


VO 4 av ie 


ot 
py? => (uf ° — ur 


+ W+ip 
in?) 


Let o € {y+0,y—9,y+io, y—ig}. Then @ € D(T) and 
[ilaoiris) < (0 |Fine)+ [ |APaoire) < 
Thus T is a densely defined and obviously symmetric operator. To conclude that T 


is self-adjoint, we use the criterion of Theorem 14.3, i.e., we show that Ran(T +i) = 
KH. So let y € # and define 


p= [atirlary. 


We will show that, for any @ € #, 


d(y|P,yw). (15.42) 


1 I 
do 1PAP) = 751M), dela) = aa 


From this it immediately follows that @ € D(T) and, for any @ € #, 


(o|(T+iJ9) = [(A+iaolPe) = fal lr.y) = (ly). 


Hence (T +i)@ = y and T is therefore self-adjoint. 
In order to prove (15.42), it is convenient to remark that, for any bounded Borel 
function f, one can define the ey with respect to a PVM as a norm-convergent 


limit of operators. Let f, = aaa Ro) ) be a sequence of simple functions con- 


verging uniformly to f. Then 
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> 4 () 
. . n 
lim [ fu(2)aP, _ iim 24) Fo 
l= 


converges in norm, and the limit is independent of the sequence f,, and equal to 
the operator f f(A) dP,, as defined in Remark 15.1. This follows from the easily 
checked fact that, for any simple function g = zat Aj Xj, One has 


| fea )dP, 


Now back to (15.42). Let 22 1 ot! "y Ne be a sequence of simple functions converg- 
(n) 


ing uniformly to (A +i)~!. We can assume that, for each n, the sets AY” are pairwise 
disjoint. Then for any Borel set A C R and any yw € #, we find that 


n 
fim (9 |? 3a)" = fim a6 Po) 
j=l é 


1 
= | tol). 


This proves the first equality in (15.42). The second follows analogously by 


Y 2 jPs;|| S IIslle - 


(o | Pa@) 


n—-eo 


(g|Pag) = lim C 3 a”) Pron | Ps » af) Pain v) 
j= 


= lim > |e n) F (VP) 


n—-eoo j 


erat WIPAY) - 


15.2.5 Unitary Groups 


Recall that one motivation for looking at self-adjoint operators was the expectation 
that self-adjoint operators are the generators of unitary groups. With the spectral 
theorem to hand it is now straightforward to show the following theorem. 


Theorem 15.7. Let H with domain D(H) be a self-adjoint operator. Then 
U(t) =e 


defined by the functional calculus is a strongly continuous unitary group and H is 
its generator. 


Mathematical Physics 


15.2 The Spectral Theorem 333 


Proof. The group property and unitarity follow from properties (i) and (ii) of the 


functional calculus of Theorem 15.4. Strong continuity follows from (v), because 
ixt 


e '” converges pointwise to | for t — 0, so we have 
s-lime! =|y. 
t—0 


Hence U(t) is indeed a strongly continuous unitary group. To show that H is indeed 
the generator, we need to show that e“”" w is differentiable if and only if y € D(A), 
and that the derivative is —iHy. Note first that the difference quotient 


2 2 


d(y| yy) 


elt —| 


y 


ev iat _ 


t t 
12 ast—0 


remains bounded as t — 0 if and only if wy € D(#). For y € D(H), the limit is 
indeed —iHy, as can be seen from 


e iat _ 
= = lim [ iced 
t—0 


aor ra as t0 


alr 
diy | Py) = 


eit _ 
lim |] © —— sane 
t—0 t 


and dominated convergence. 


Stone’s theorem now states that any strongly continuous unitary group is generated 
by a self-adjoint operator. This is comforting since it means that we can focus on the 
self-adjoint generators without taking the risk of missing some interesting unitary 
evolution groups. 


Theorem 15.8. (Stone’s Theorem) Every strongly continuous unitary group U(t) 
has a self-adjoint generator H, i.e., U(t) =e". 


The proof is not difficult with the machinery we have developed. But that is enough 
abstract mathematics for now. Let us return to quantum mechanics. 


15.2.6 Hy = —A/2 


Now we have more or less proved that there is a PVM for every self-adjoint op- 
erator, in particular also for the Schrédinger operator H, or any other operator that 
seems relevant to us. But what is the PVM for H = Hyp + V? Good question! In gen- 
eral we do not know, but what we are looking for are basically eigenfunctions in a 
suitable generalized sense. More precisely, we also look for eigenfunctions for the 
continuous spectrum. We shall now explain this for Ho and discuss the extension to 
Schrédinger operators with a potential in the next chapter. So for the moment, we 
shall look for suitable solutions of 
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1 
Hop =—sAp=Ap, AER, 
in which we ignore physical constants. The suitable solutions are of course the plane 


waves, 1.e., the Fourier “basis functions” 


F 1 
= +ik-x Xx = -7 : 
p=e , ) 


This is nice, since we already know that the Fourier transformation is unitary on L? 
and that, in the Fourier representation, Hyp becomes the operator for multiplication 
by k*/2. We can thus write the PVM of Hp explicitly as 


PMA) = Fl Yak) F , 


where A is a Borel subset of R. We see immediately that the support of P”®, i.e., the 
spectrum, is 6(Ho) = [0,°°). More explicitly we have 


0 = 1 ik-x —ik- 
(PM (A) y) (x) = (n)3 = = erik | | e*¥y(y) ey] Pk. 


Recall the Dirac notation where this is written formally as 


Peo [k) (k| dk , 


)= i 


ie., as projection onto the generalized eigenfunction |k). To get the result in the x 
representation, we need to project onto |x) : 


(xiPtocayiy = f 


(x|k) (kl y) Pk = | elk Gk) Bk. 
{k|k2/2€A} 


{k | k2/2€A} 
Finally, we can write Ho as 
15 3 
Hy = f 51k) (kia 
which is just another notation for the statement we started with, .e., 
—1 1 2C¢ 
Ho = F ak F . 


This is a good point to use this representation in order to compute a few functions 
of Ho explicitly. 


Remark 15.7. The Free Propagator 
In n dimensions, the so-called free propagator is 


(y|e7it# |x) = Sain A eily—x?/20 
1 n 
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This integral kernel yields the solutions of the free Schrédinger evolution. For an 
initial wave function @ € L7(IR"), we have 


1 ily—x|? n 


son f (xleMoly) (yl) @” 


(x,t) = (e“@)(x) = L?-lim 


One can compute the free propagator as follows. In the Fourier representation we 
have 
e tHy — Feith? 2g 

Recalling that (inverse) Fourier transformation turns multiplication into convolu- 
tion, we can compute this directly for 9 € 7: 

Gr -\,-ith/2 gp, —n/2( g-l 

Fe NFO = (20) (FG) *Q, 

=:G(k) 


and the inverse Fourier transform of a Gaussian is again a Gaussian [see (5.7) and 
(9.19)]: 


_ 1 ilx|? 
(F ')8) = Gea | /2t 


To see that the integral representation holds for all functions in L* is somewhat 
technical, and we refer the interested reader to [3]. 

From the explicit formula, one can directly read off the spreading of the wave 
packet: 


sup |e" o(x I< oayallelh ; 


or the sojourn probability in a domain G C R”: 


[lee oPare< lela 


We already anticipated the precise asymptotics of the free time evolution for tf — co 
when discussing v.., using the heuristic argument in Sect. 9.4. Now the time has 
come to give a proof. 


Remark 15.8. The “Free Asymptotics” and Stationary Phase 
We will show that, for @ € L* (R"), 
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e Ho 0) _ 


1 i A~ (xX 
Gn "9 (2)| 0, for t3 0. (15.43) 


It suffices to prove this for @ in the dense subset of Schwartz functions .”, since our 
claim is that the difference between two t-dependent families of unitary operators 
converges strongly to zero. And for @ € -¥ the proof is just a simple computation. 
Multiplying out the square in the exponent in 


(ep) (x) = p(x,t) = i. TEE a | eee 


we find that 


rn gtave() | ghaee(-#2)e0(Z) aire 


_expix’ /24) 26 (5) | exis 17) fexo( > lex (i) | oat. 
ee a ee a 


(it) 1/2 t (2mir)!/2 
lt 
Put 
y° 2 
iy) = lew (i) -1] oN) € 
Then 
x?\~ /x 
7 Gpi2 OP ( ) ‘i ( t ) 
and 


lr? = f (=) (=) atx = Gale) = (Palin) = Nal 


th 


Now h,(y) converges pointwise to zero for tf — oo and, with |h,(y)|* < 4|@(y)|? €L!, 
we can use dominated convergence to conclude that ||/,||? — 0 for t — ». 
Now recall (15.11). In the fourth step there, we really get 


6()fa Geren. 


and it remains to show that the remainder R(t) goes to zero. With the above result, 
Cauchy—Schwarz, and Parseval, this follows directly: 


r= frotenG) envaalf nea 


aie 


1 
tim, \9(x,0) Pa (=) €= Jim 


t— 00 yn 


s ean IlFe|| — 


Mathematical Physics 


15.2 The Spectral Theorem 337 


Remark 15.9. The Stationary Phase Method away from the Stationary Point 

In (9.18), we discussed the stationary phase argument and just described the leading 
order term rigorously. We can now also study the error terms, since we know how 
a free wave packet moves for large times. Indeed, for large t, the wave function is 
supported at positions x in such a way that x/t lies in the support of the Fourier 
transform. Everywhere else, the wave function goes to zero. And it is easy to esti- 
mate how fast it goes to zero. Here is a precise statement. 


Theorem 15.9. Let p € .Y and K = supp(@) be compact. Let W be an open €- 
neighborhood of K, i.e., dist(W°, K) = € > 0. Then for any N €N, there is a constant 
Cy, such that, for any pair x, t withx/t ¢ W and |t| > 1, 

| (e #40 @) (x)| < Cy (1 + |e)” ; 


This simple “no stationary phase” statement is just based on integration by parts. 
Observe that 


(e"@) (x) = EOE ie exp f (x Ft) | @(k) d"k 


- cane ||P ic + |¢l) Ge) @(k) d"k, 


and put a = 1 + |t|. Then the integrand contains the oscillating exponential e!”* with 
the “phase function” 


k-x—k’t/2 x—kt 
S(k) = ———_ here Vi,.S = : 
(k) ite where Vx Ith 


But by assumption, on the support K of @, we have 


|= -k| loo 1 
T 2 sdist(W",K) = 5€ >0. 
[cl 


The gradient of S is therefore bounded from below by €/2, and we get the identity 


|ViS(k)| = 
+1 


N 
eitS(k) _ lS) *vstk)-V| elOS(k) 
1 


where we have dropped the x and ¢ dependence in the notation. Integration by parts 
then yields (@ € .Y) 


N N 
[e™oow) d’k= 1" ( 1 ) ioe fv i ro 9(k) dk. 
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Now it is easy to see that the integrand is bounded independently of x and t, whence 
the absolute value of the integral yields exactly the constant Cy. 
Note that, in Theorem 15.9, the stronger statement 


| (e-*#0@) (x)] < C(I + |e|+|al) 


actually holds. This can be shown by putting o = 1+ |t|+ |x| in the proof. However, 
estimating V,.S and the integrand above then takes more effort. We have skipped 
this in order to focus on the simple structure of the argument. Moreover, for the 
following application, our simple result is sufficient. 

Assume that @ € Cy (R" \ {0}), i.e., there is a > 0, such that k > a for all k € 
supp(@). It is now a simple consequence of the above computation that the domain 
from which the wave function escapes grows with time. More precisely, it holds that 


IIx (Ix| < ale|)e*# || < Cy(1+|t]) (15.44) 


The proof of this estimate comes immediately out of the constraint that |x|/t <a<k 
and the resulting estimate 


co p)(x)| < Cau(1 + el) ™, 
which yields 
[xls <alt|) (e#@) (x)d"x < Cy (1+ |t|) (altel) < CH (1+ |l), 
for appropriate constants C, C’, C", and M large enough. a 
As we saw from the support of its PVM, the spectrum of Ho is [0,cc). Thus the 


resolvent (Hy —A)~! exists on C\R* and we will now compute its integral kernel. 


Remark 15.10. The Resolvent of —A 
We will show that for all « € C with Rx > 0 and all g € L?(R3), 


1 peoklx-yl 


[(—A+ x’)~"@] (x) p(y) d’y. (15.45) 


“an ey 


Before we give the derivation of this formula, note that the integral kernel or Green’s 
function 


1 e7 Kx 
G(x) := — 
ae Ix] 


is integrable and square integrable, i.e., G € L!(IR*) NL7(R*). Hence by the Cauchy- 
Schwarz inequality, the integral on the right-hand side of (15.45) exists for all @ € 
L? (IR?) and x € R3. By Young’s inequality, convolution with G defines a bounded 
operator on L”(IR*). More precisely, let y := G* @. Then with Fubini, we have that 
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Iles < [fo waa| [lo x-noex-a]es]atyety 
< IIR llol 


Hence both sides in (15.45) define bounded operators on L? and it suffices to prove 
the equality for @ in the dense set of Schwartz functions .7. To show this, we use 
the Fourier representation again: 


(-A-ay' =F 


where it is convenient to put A = —K? and to assume % Kk > 0. (We thereby cover the 
whole resolvent set C \ Rt of —A. But we could just as well take the other square 
root of A. Then in the application of the residue theorem below, we would close the 
path of integration through the lower half plane instead of the upper half plane.) 

Let f(k) := (k*+7)~! and fr(k) = xper(k) f(k), where ¥,<p(k) is the char- 
acteristic function of the ball Br around the origin with radius R. Then according to 
Remark 13.5, we have, for @ € -¥ and for all x, 


((—A + x2)-!@) (x) = (F'f@) (x) = lim (F' fr@) (x) 


(1) [ie we—neorer, 


Here we have used the fact that fg@ converges to @ in L!, whence the inverse Fourier 
transform even converges uniformly. 
We will now show that 


I 
F 


1 3/2 ‘ 1 3 eik-x ‘ 
; ft a= _ 4 _ ee, 
jim (=) 7 Jr) 2 jim (=) I, pre 


on vg 
= jm (4) [I —, xk d(cos @) dg dk 


2 pR eikx 
ile / — kak. 
Roo ix J_r (k—ik) (k+ik) 


Now we can apply the residue theorem. We close the integration along a rectangular 
path in the upper half plane, going from R to R+iv/R, from there to —R +iV/R, and 
finally to —R. For R large enough, this path encircles the pole at k = ik. Hence we 
find that 
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ty ong 1 e*! 
fim ({—\ (#- as 
jim (5) ( a) (x) 4n |x|” 


granted that the contributions of the extra pieces of the path vanish in the limit 
R — c0, We show this for the piece y(t) = R+tiVR, t € [0,1], from R to R+ iVR. 
Observe that 


: i os ul = 1 jg se 
ixJyke +n | fix Jo (R+tiV/R)?2 + x2 
1 o-tVRx 
< < f © dt 
x Jo JSR 
= Cy (l-e-¥**) 
x2 R , 


with a «-dependent constant Cx. In spherical coordinates, ignoring constants, we 
find for the L?-norm, 


[ (1-e- ee oa). 
0 0 


XA R2 x2R2 
2 
1/R (1-e-vR) co (1-e-ve) 
i; x2 R2 I, x2 R2 
1/R Rx co 1 
< | -| 
0 x2R?2 ieee 
< z + : 0 R 
—+=— as R— oo 
— R22 R , 
|_| 
Remark 15.11. The Resolvent of —c?A 
In the case of Ho, one has a prefactor in front of —A, whence 
1 \ 7! 
((-ca+ ) °) (x) = c ((-0 *) °) (x) 
1 eo /X-y|K/¢ 
= dy. 
A4nc? / \x—y| oly)d'y 
|_| 
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15.2.7 The Spectrum 


We understand now that a self-adjoint operator can be written as a sum or an integral 
over a family of pairwise orthogonal projections, i.e., in terms of its associated PVM. 
The PVM is a measure supported on the spectrum of the operator, and we may ask 
what further relevance the spectrum has. This is a very natural question, given that 
the quick acceptance of quantum mechanics was mainly based on the successful 
explanation of spectral lines of atoms in terms of eigenvalues of the Schrédinger 
operator. The eigenvalues are part of the spectrum. But the spectrum can also be 
continuous, as in the case of Ho or x. Or it can have both eigenvalues and continuous 
parts, like the Hamiltonian of the hydrogen atom, for example. How can one see 
what type of spectrum a given operator has? 

The spectral measure corresponding to an eigenvector @ with eigenvalue Ao is a 
point measure pf,” = 6(A —Ap)||@o|l?. To see this, note first that (H — z)~!@o = 
(Ay — z)~!@o and the functional calculus of Theorem 15.4 imply that, for any 
bounded Borel function f, we have f(H)@o = f(Ao)@o. But then it follows for all 
f € M(o(H)) that 


f(Ao)|lo|? = (| fH) 90) = f FA) aUP(A) 


and thus 41? = 5(A — Ao)||poll?. 

In contrast to the spectral measure of an eigenvector, the spectral measure uj 
generated by a cyclic vector 7) is supported on all of o(H), and we can therefore 
think of it as “the” spectral measure of H. It will have continuous parts and parts 
supported on points. This suggests first looking at general measures on the spec- 
trum, independently of a corresponding vector. But this is once again abstract and 
simple mathematics: a regular Borel measure, which in particular is finite on com- 
pact sets, can be decomposed into three parts. Let P := {x € R|u({x}) 4 0} be the 
set of points which have nonzero measure and define, for measurable A C R, the 
point measure part of UU as 


Mpp(A) = DY m({x}) = M(PNA). 
xEPNA 


If L = Upp, we say that pi is a point measure. Naturally, one defines the continu- 
ous part of Las U, = Ll — Upp. Applied to spectral measures, the idea is that [< is 
supported on the continuous part of the spectrum. But this continuous support can 
be very different from what we think of as the “continuum”. For example, the Can- 
tor set has the same cardinality as the continuum. And what we have in mind is a 
measure like Lebesgue measure. Let us say more precisely what we mean by “like” 
Lebesgue measure. 


Definition 15.6. We say that a measure pl on R is absolutely continuous with respect 
to Lebesgue measure A, and write up <A, if 


A(A)=0 => p(A)=0, 
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i.e., if all Lebesgue null sets are also -null sets. 


The Radon-Nikodym theorem states that u <A implies that 1 has a density with 
respect to A, i.e., that du = p dA for some locally integrable function p. Let us give 
a short sketch of the proof. Let (R,@(R),P) with P= +A and L?(R,dP). With 
spectral measures in mind, we can assume that (IR) < ©, which implies that, for 
any L-integrable f, we have 


[\fiau <u (R) (firtan) <a ce) (fintar) 


Hence, ¢(f) := { fdu defines a bounded linear functional on L7(IR,dP), and ac- 
cording to Theorem 13.2, there is a unique vector g € L7(IR,dP) with 


ef= | fau=ffear= | feau+ f fead. 


or, rearranged, 
[ta-eau= [ feaa. 


Let G; be the set on which g > 1. Then, by inserting f = yc, € L’(R,dP) into 
this equation, it follows that A(G,) = 0. Analogously, one sees that the set G2 with 
g <0 is a Lebesgue null set. The assumption of absolute continuity then implies 
that (G; U G2) = 0. Let G = G; UG». Then for A € A(R) and 


1 
= ja." , 


we find that 
[sau = [xspanr 
with the non-negative density 
__§ 
ae read : 
The next observation is that any regular Borel measure pl can be split into a part Lac 
which is absolutely continuous with respect to Lebesgue measure and a part Using 


which is singular and contains, in particular, its pure point part. This is easy to see. 
Trivially, we have UW < +A, and hence there is a density p such that 


wl) = [pau f pad, 


with p <1. Let F = {x| p(x) <1} and F* = {x| p(x) =1}. Then clearly A(F*) =0, 
and therefore [ac(A) = (AMF) and Using(A) = H(ANF*). Note that bac <A, 
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since, for A(N) = 0 and Uac(N) = U(NMF) > 0, we would get the contradiction 
that 


por) = | pau + | par = | pdu<uU(NNOF). 
NOF NF NOF 


Hence, Uac(N) = 0. 

Now we can subtract the pure point part from the singular part and define the sin- 
gular continuous part of a regular Borel measure as Usc = Using — pp. In summary, 
we get the decomposition 


U = Upp + Mac + Msc é (15.46) 


Let us return now to the spectral measure a of a cyclic vector 1). Its decomposition 
immediately yields a decomposition of the Hilbert space in the form 


L’(o(H),du;}) = L’ (o(H), du) OL’ (o(H), dun.) @L’ (o(H), dug) , 


and by inverting the unitary map U from # to L* (o(H ),du a) , we get the decom- 
position 


KH = Hoy ® Ho B He - (15.47) 


We can also understand this in the following way. Suppose for example that @ € 
Hc. Then the corresponding spectral measure tie is absolutely continuous. This is 
because f = U@ € L?(o(H), dat), and therefore, with (15.41), 


du h(A) =|F(A)| au (A) = |f(A) |p) aa . 


This shows in particular that the ambiguity in the spectral measure (the cyclic vector 
is never unique) has no influence on the decomposition (15.47). The latter is unique! 

Finally, one can also decompose the spectrum into different, not necessarily dis- 
joint components. One defines 


Opp(H) = 0(H|.%,) = supp(Lph) , 

Oac(H) := O(A|.7,.) = Supp(Hat) ; 

Oxe(H) = 0(H|..) = supp(U.e) - 
Here the pure point spectrum Opp is the closure of the set of eigenvalues, where the 
latter was previously denoted by op. 


Remark 15.12. On the Spectrum of Unitarily Equivalent Operators 

Let S and T be self-adjoint, and let T = USU* with U unitary. Then, clearly, o(S) = 
o(T) because A — S is invertible if and only if A — T = U(A — S)U* is invertible. 
Let us go quickly through an argument which shows that even more is true, viz., the 
spectral types agree. This follows once we show that the PVMs P® of S and P’ of T 
transform according to 
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PS =UP'U* , (15.48) 


since the spectral measures then agree: 


ud (A) = (yIPSy) = (U* WIPLU*W) = up (A). 


For (15.48), recall that P? = y4(S) and P/ = y4(T). Using (15.39) and approx- 
imating as in Theorem 15.4 (v), we find that (15.48) follows from (7 — z)~! = 
U(S—z)~!U*. | 


Now back to physics. Think of the Schrodinger operator H. The subspace pp con- 
tains linear combinations of eigenfunctions. These are the bound states, 1.e., states 
which stay within bounded regions during the time evolution. In Hic, there are the 
wave functions which spread and propagate to infinity, while in .%c, there are all 
the wave functions which behave neither way, and which one would like to ignore 
altogether. As an example let us compute (9 | et y) for WE Ap and PEC H’: 


(ple ty) = femal) — fe™oomayan =p? Y(t) 30, 


where we have used the fact that, like ie : is is also absolutely continuous, and 
we have applied the Riemann—Lebesgue lemma. So the overlap of y(t) = ew 
with any fixed wave function @ goes to zero for t — ©, or, in other words, y(t) goes 
to zero weakly. While this does not yet show that y(t) goes to spatial infinity, it 
gives a first idea of how spectral and dynamical properties are related. 

There is an area of mathematical physics called scattering theory, which makes 
this picture much more precise for the case of spatially decaying interactions. In 
particular one tries to show that states in #%. move asymptotically according to 
the “free” dynamics, and such states are called scattering states. As a byproduct, 
one often obtains .#%. = {0}, and such Hamiltonians are said to be asymptotically 
complete. Among other more important things, this will be touched upon in the next 
and final chapter. 
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Chapter 16 
Bohmian Mechanics on Scattering Theory 


The quantum equilibrium distribution tells us the probability for a system to be in a 
certain configuration at a given time f. That is the basis for the quantum formalism 
of POVMs, PVMs, and self-adjoint observables on a Hilbert space. In this last chap- 
ter we shall return to the beginning of it all, namely to Born’s 1926 papers [1, 2], in 
which he applies Schrddinger’s wave equation to a scattering situation. In this ap- 
plication, Born recognized the importance of the quantum equilibrium distribution 
p =|w\* as the distribution of the random position of the particle after scattering. 

Curiously though, the application of quantum mechanics to scattering situations 
comes along with a shift of emphasis on the meaning of the quantum equilibrium 
distribution. In scattering theory, the crossing probability of spacetime surfaces be- 
comes meaningful. This is naively clear when one pictures the scattering situation as 
in Fig. 16.1. There are detectors surrounding the scattering potential. The question 
is: What is the probability that the detector will click? 

Prior to answering that one must first clarify the following question: Is the time at 
which a detector clicks a fixed given time, i.e., a time the experimenter can choose? 
Intuitively and correctly, the answer is no. The time is random. The detector clicks 
when the particle arrives at the detector surface and crosses it. So both the where 
and the when of the detection event are random. 

It is immediately clear that these are questions which Bohmian mechanics is 
tailored-made to answer, since the notion of where and when the particle crosses a 
surface is a natural one when trajectories exist. It is another matter to find a closed 
formula for the crossing probability. A nice formula can be found when one con- 
siders the scattering regime, which is a space regime where the particles move es- 
sentially along straight lines. There are plenty of books on scattering theory, and we 
shall not invent scattering theory anew. But true to our intention to provide a clear 
ontological picture and true to our maxim expressed by Melville: 


While you take in hand to school others, and to teach them by what name a whale-fish is 

to be called in our tongue leaving out, through ignorance, the letter H, which almost alone 

maketh the signification of the word, you deliver that which is not true. - HACKLUYT 
Melville (1851), Moby Dick, Chap. 32 [3] 
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Fig. 16.1 A scattering experiment. A particle with wave function y is sent to a target, here a 
potential V. The detectors sit far away from the target and wait for the particle to arrive. The 
scattering experiment is in principle very much like the first-exit experiment in Fig. 16.2, with the 
difference that, in a scattering experiment, the particle comes into the target region from far away 
and the detection is far away from the target 


we shall elaborate on surface-crossing probabilities for Bohmian trajectories. Then 
following this line of thought, we shall examine the essential elements of scattering 
theory, until we end with Born’s formula for the scattering cross-section. 


16.1 Exit Statistics 


Consider the experiment sketched in Fig. 16.2. A particle is located within the region 
G at time t = 0. The wave function of the particle is y and supp y C Gat time t = 0. 
The wave function obeys Schrédinger’s equation. When and where does the particle 
leave G? That question is in general not well posed, because the particle can leave 
and reenter the region G. A good question would be: When and where does the 
particle leave the region for the first time. In other words, when and where does the 
particle cross the boundary 0G of G for the first time? 

This problem is not the common textbook problem of quantum mechanics. The 
reason is that time is not an observable. There are various arguments for that, but 
they are not important for us, since we have learnt that any experiment which ends 
with pointers pointing to values has POVM statistics. The same holds here. More- 
over, we shall have no need to address the question as to whether a unique form of 
the associated POVM exists, and if it does, what it is. Why should we worry about 
the POVM, which by its very meaning handles all wave functions? If one is inter- 
ested only in very special wave functions, as we are, for example, those which are 
well localized at time zero within G, we do not need an abstract formalism. 
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OVDOODODOAOO® 


Fig. 16.2 Experiment to determine the exit position and exit time from a region G. A detector 
clicks at a random position and random time 


Another question which might be worrisome is whether the interaction with the 
detectors must be taken into account to obtain the observed scattering statistics. In 
other words, why is the unmeasured crossing probability equal to the measured one? 
Detectors interact with the particle and interaction means change of wave function, 
which means change of trajectory, which means change of exit statistics. All this is 
true. So the experimenter must be careful with the choice of detectors, so that the 
effect of the detection on the wave function is small enough, or of such a quality, 
that the trajectories are not substantially altered. This may be particularly important 
when measuring first-exit times of the general kind we discuss below. It is presum- 
ably less important in the scattering situation which we discuss after that, because 
in scattering setups, the interaction energy between the particles and the detector is 
small compared with the energy of the particles. Having said this, we shall follow 
the common practice of quantum physics and henceforth not worry about the pres- 
ence of detectors, simply taking it for granted that the detection is designed in such 
a way that it does not mess up the trajectories too much. 

Let us now start with a simple argument which deals with the first-exit time of 
the particle from G. That time is a random variable on the initial positions within 
the support of the initial wave function y. It is defined by 


1(x) = inf {t|X(¢,x) ¢ G, x € suppy} . (16.1) 


Then, using the quantum equilibrium distribution, we can attempt to compute the 
“distribution function” 


Mathematical Physics 


348 16 Bohmian Mechanics on Scattering Theory 


PY ({x|c(x) >t}) = BY (xi 
6 PY (X( 
= PG) 


L lw(x,1)| ay. (16.2) 


s,x) € Gforall s <1) 


t,x) € G) 


This is simple enough, but the second equality is only correct if the particle never 
returns once it has left G. The third equality is equivariance. As usual one obtains 
the exit-time density p¢’(r) from the distribution function PY(t > rt) of t by differ- 
entiation 


2 
pe (t) = —2pvesi) --| AWOL 43, 
dt GC 
re) 
a —~(Wy|OGYr) 
ot 


= = ((HvlOcy1) — (vilOcHy,)) - (16.3) 


We have introduced the position PVM Og to make it look like advanced quantum 
mechanics. We have also used the notation y; to highlight the time dependence of 
the wave function y(x,t). For [t),f2] C R™, we get 


PY (t € [t,t2]) = [ev ar 


So we have a bilinear form, but is it positive? Do we get a POVM? In fact, we do 
not, because of the inverted commas on one of the equals signs in (16.2). 

Returning to (16.3), we introduce the quantum flux jY = |y|*v” in the third 
equality, recalling the quantum flux equation (7.17). Using Gauss’s theorem with 
dS as oriented surface element of 0G, 


vise __—s dex = [VM xNex 


=| jY(x,t)-d8. 
0G 
t2 t 
px'(rjar= | | jY (x,t) -dSdr 
ty ty 0G 


is the “net flow” through the spacetime surface 0G x |t, ,t2|, which can be negative. 
It can be interpreted as a probability if the following positivity condition holds. 
Suppose we define the normal vectors to 0G as pointing outwards. Then 


Therefore, 


j¥(x,t)-dS>0, forall x,t € 0G [t,t]. (16.4) 
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Fig. 16.3 Signed crossings of the boundary 0G of G 


Tf (16.4) holds, then 


ih 
iM (x,1)-dSdr = —— fy" (x, Vy (x,t) — w(x.) Vy" (x,1) | dSar 
= “crossing probability of the surface element dS indt”. (16.5) 


Now this is news. But can we trust it? What does the flux integral mean if (16.4) 
does not hold?! Bohmian mechanics gives the answers. Let us look at the Bohmian 
trajectories. 

The Bohmian trajectories X(t),+9 are randomly distributed according to the ran- 
dom distribution of initial values X(0), which are | y(0)|?-distributed. The trajec- 
tories define the random number N(AdQG, Ar) of crossings of X(t);>0 through AdG 
within the time interval At. This number is a function of the initial positions of 
the trajectories, and inherits its randomness from the | y|?-distribution of the initial 
positions. The number is naturally decomposable into two random numbers 


N(AdG, At) = N,(AdG, At) + N_(AQG, At) , 


with N;(AdG, Ar) as outward crossings and N_(AdG, Ar) as returning inward cross- 
ings of AdG within Ar, as shown in Fig. 16.3. The number of signed crossings is the 
difference 


N.(AOG, At) := Ny. (AAG, At) — N_ (AG, At) . 


We now cut the set AdG x At into small pieces AOG ; x At;, which are assumed 
to be so small that they can only be crossed once, either positively or negatively: 
N;(AdG;, At;) € {1,0}. Then the total number N(AdG;, At;) becomes the indicator 
function of Boltzmann’s collision cylinder, as we shall explain now. 


' The positivity condition on the quantum flux and the relation to mathematical scattering theory 
is discussed further in [4]. 
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Boltzmann’s statistical mechanics argument, which is crucial for crossing prob- 
abilities, observes that the particle can only cross AG; within the time interval Af; 
if it is at time ¢; in the volume (the collision cylinder’) 


AC; j = |v At;-AS || , 
Therefore 
N(AdG;, At) = XAG3 (X(ti)) : 


The probability for the particle to be in the cylinder is given by the quantum equi- 
librium (qu. eq.) distribution and thus by equivariance 


2M (v(AaG Ati) ) = EY (x20; ; (X())) 
= P¥i(AC,;) 


qu.e4, Wi, l2|v™% At;-AS, | = iM” -AS j| At; ‘ 
Using 


N(AdG, At) = 'N(AdG;, Ati) , 
dil 


and by linearity of the expectation value, we can compute the expectation value of 
the total number of crossings through the boundary of G: 


=v (N(AdG, in.t2])) =| |j”-dS|dr . 


AdGx [t1 t] 


Observing that we obtain a — | from the scalar product of the velocity and the surface 
normal when a trajectory returns, we likewise obtain the expectation of the number 
of signed crossings Ng (Ad G,|t t2]) : 


ev (N, (AaG, in,t2])) as _— {i aSar (16.6) 


The flux integrated across a surface and over some time interval is therefore in gen- 
eral the expected number of signed crossings of the trajectories of the surface in that 
time interval. It is important to understand that it is the expected value of signed 
crossings and not the probability. The expectation is additive, while the probability 
is not in general. The crossing probability is in general not additive because a trajec- 
tory may cross the surface more than once. Therefore the events where a trajectory 
crosses the surface AOG say at time f; and at time f are not disjoint, as one immedi- 
ately sees when one follows the trajectory back in time to its starting region. Hence 


? The name “collision” comes from its use in the statistical mechanics of interacting particles. In 
our context, “collision” means “crossing”. 
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the probabilities of these two events do not add to give the corresponding crossing 
probability. 

Additivity holds, however, when each trajectory crosses only once, which is en- 
sured by the positivity condition (16.4). Assuming j%-dS > 0 for all times means 
assuming that the scalar product between the Bohmian velocity and the surface el- 
ement is positive, so that the trajectory crosses from inside to outside. This in turn 
means that the trajectory can cross the surface only once, whence the signed number 
equals the total number of crossings, which is either zero or unity. Then denoting 
the first-exit time from G by T(x) [see (16.1)], and denoting the first-exit position 
by X,:=X (t(x)) , where x is the starting point of the trajectory, we obtain 


:Y (N(AdG, [n,t2]) ) = EB” (N.(A9G, [t1,t2]) ) 
= 0-P¥(t ¢ [t,t] or X; ¢ AdG) 
+1 -PY¥(t € [t,f2] and X; € AdG) ; 


Hence, by virtue of (16.6) and the positivity condition (16.4), the crossing probabil- 
ity is 


PY (X, € AIG; € [fn 12]) = EY (N (AIG, [t,02]) ) 


Us 
= i j’ -dSdr . (16.7) 
t; JAG 


If the positivity condition does not hold, the flux integral on the right of (16.7) is 
in general not positive, and the exit probability is not given by a simple expression. 
It is easy to see that the set of “good” wave functions which satisfy the positivity 
condition (16.4) is not a linear set, which means that the superposition of two “good” 
wave functions is not in general a “good” wave function. A first-exit statistics POVM 
will only be given by the flux on “good” wave functions, and it is not clear what it 
will look like in general. Thus we have an example of a measurement situation where 
we are only interested in the statistics of particular wave functions, and where we do 
not care at all about the general quantum formalism. Is this in any way problematic? 
Of course, it is not! 

To help appreciate the fact that Bohmian mechanics yields in the most straight- 
forward manner that the crossing probability is determined by the quantum flux, we 
contrast this with other versions of quantum theories with trajectories. For exam- 
ple in stochastic mechanics [5, 6], where the trajectories are like Brownian motion 
paths, the distribution of the first-exit time will bear no relation whatsoever with the 
quantum flux. 


Remark 16.1. A Four-Dimensional View 

The quantum flux equation (7.17) for one particle can be viewed (like any conti- 
nuity equation) as the assertion that the current j4, u = 0,1,2,3, 7 = (p”,j”) is 
divergence-free: 
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Oy jr =0. (16.8) 


In this four-dimensional view, a four-current j" defines the particle worldlines X" 
by requiring that these worldlines are integral curves along the four-current vector 
field. Choosing an appropriate parametrization, we can write 


dX" uweyn 

dp nt ")- 
The four-current version of the quantum flux equation gives another perspective on 
equivariance. Let .¥ denote a (three-dimensional) smooth hypersurface in space- 
time, which is crossed only once by each worldline. If the worldlines do not turn 
backward in time, as is the case of interest here, the assumption is simply that the 
surface has a timelike normal at each point. Then for a subset AY € F, we read 


ie j:do = PY (AF is crossed by a worldline) ; (16.9) 
F 
where we take the probability as being unity on .F. 

This reading of the integrated four-current as crossing probability is consistent, 
because the analogous statement holds for any other surface .¥’ which is also 
crossed only once by each trajectory, and the two probabilities are connected by 
the current. To see this, consider the worldline cylinder C with base AY and cap 
AF¥' € #', which is the image of AY under the flow map given by the four cur- 
rent. In other words, the lateral surface of the cylinder is made up of worldlines. The 
cylinder surface is taken to be oriented with outward pointing normal vectors. Then, 
by Gauss’s theorem, and by virtue of (16.8), we obtain 


jdo=-— | j-do+ j-do= oF ate= (16.10) 
[.i0o=—[,F40+ J ioo= [5 


where, in the last equality, we used the fact that the current is orthogonal (by con- 
struction of the current cylinder) to the lateral surface of the cylinder. Hence the 
crossing probability is preserved when the current is divergence-free. 

In Bohmian mechanics the natural type of surface ¥ is a x° = t = const. hy- 
perplane with normal (1,0,0,0), and for that (16.9) yields the quantum equlibrium 
distribution, i.e., Born’s statistical law, because j-do = |w;|7d>x. Furthermore, the 
expression always has the same form, a property which we called equivariance. 

But we can also consider skewed hypersurfaces. In the above, we considered 
surfaces given by the parametrization 


X(x°, u,v) = (x°,x1(u,v),x°(u,v), 23 (u,v)) ER*, x° € [t,t], (uv) EE CR’. 


The scalar product j-d2 can be computed as the determinant of the enlarged four- 
dimensional matrix made from the Jacobi matrix of X and j as fourth column. [Re- 
call that, if instead we list the canonical unit vectors ex, k = 0,1,2,3, as fourth 
column, we obtain the surface normal in 2(x°,u,v) by expanding the determinant 
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with respect to the fourth column.] With (u,v) = (x! (u,v),x?(u,v),23(u,v)) €R3, 
we obtain 


i oe @ -# 
0 ax” Dx 7 

jdz=det| | 52 3x2 a | teddy = j-0,0 \ 0odtdudy . 
0 as Ox 7 


In the language of forms, j is a 3-form 
eo, = pdx! Adi? Adi? — j'dx® A dx? Ade? + j7dx® Adx! Adi? — fodx® Adx! Adz. 


The 3-form is by its very meaning an object which is to be integrated over a three- 
surface, and by definition, 


[a= | 21(2.0,41,3,)dtdudy, 
= 


where dx*(Q,) is defined as dx*(0,) = dx*/dy. Naturally, this yields the same as the 
determinant above. 

The 3-form is to be replaced by a 3N-form in the case of N particles. The 
corresponding current is (3N + 1)-dimensional and it gets integrated over 3N- 
dimensional hypersurfaces, e.g., over the t = const. hyperplanes, yielding the quan- 
tum equilibrium distribution. | 


16.2 Asymptotic Exits 


We shall now analyze a physical situation where we expect the positivity condition 
(16.4) to apply, and hence also (16.7), and we thus have a formula for the crossing 
probability. This is the situation in which the detectors are far away from where 
the wave function is initially localized, or where, in Fig. 16.3, the boundary 0G 
is far away. We connect this now with the asymptotic form of the wave function 
which we discussed in Remark 15.8, and which says heuristically that the Bohmian 
trajectories move asymptotically along straight lines. 

For simplicity, we take for G a ball of radius R, and consider the exit position 
X,. of the particle through a piece of the spherical surface Xp = RX, XY C By, (see 
Fig. 16.4) when R gets large, i.e., we take the large R limit of (16.7), and with ¢ 
integrated over all times (since we are only interested in the position), we should 
have 


lim PY (X, € Zp) = lim | | |j”-dS|d¢ = lim | | j’-dSdt. (16.11) 
Roveo R--J0 JER Roo JQ JER 


Mathematical Physics 


354 16 Bohmian Mechanics on Scattering Theory 


Fig. 16.4 Geometry for the computation of a flux across a surface 


Note that the equality of the last two integrals ensures that the flux integral is posi- 
tive, whence it does indeed give the crossing probability. 
To compute this, we recall (15.10) (setting 4/m = 1 for notational convenience) 


x,t large 1 ix? /2t~ x 
wOnt) NA ane v(*), (16.12) 


where ¥ is the Fourier transform of the wave function y at time zero. This asymp- 
totic form of the wave function says that, at large times, the wave function will be 
at positions x for which x/t € supp . 

We shall now carry out an instructive calculation which, in some sense, has al- 
ready been done in (15.11), whence the result will not be particularly surprising. It 
is nevertheless gratifying to see how everything fits together. We replace the flux 
in (16.11) by the flux of the asymptotic wave function (16.12) for which we first 
observe that, in 


Vag O(F) = ent) + eee (7) 


the second term is of smaller order since x/t = @(1), due to the support condition 
just mentioned. Therefore, for large x,t, 


iY (®) =S(YiV yn) © = (‘) (=) r . (16.13) 


This means that the flux is asymptotically radial, i.e., j’ (x) ~x=R, so that on the 
surface, 


jV-dS>0. 
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This takes care of the equality of the integrated absolute flux and the integrated flux 
in (16.11). To compute (16.11), observe the following: 


1. The surface element reads dS = RRd?@, with d2@ as the surface element of the 
unit sphere. 

2. Substituting k = R/t for t yields dk = —kdt/t as the new integration variable in 
the integral. 

3. Cy denotes the cone with solid angle 2, as depicted in Fig. 16.4. 


Then 
[| jusar 2” If (‘) o(®)/* 
0 J&R 0 Jrp \t t 
yl) * 1\"|./R 
” Fal) eG) 
O(a? [| Po|py? =| ware 
0) z Cy 


Hence, for (16.11), we get 


—.dS dr (16.14) 


2 R3 
; a 


lim P(X. € Ex) = fim, [ J at-asar = [ |@(k)|? dk 
R00 Roo JQ) ZR Cy 


or a little less precisely, 


R1 
PY (X. € Ep) a | |W(k) |? Bk. (16.15) 


There are various mathematically rigorous assertions around (16.15). The asymp- 
totic equality between the integrated absolute flux and the integrated flux in (16.11) 
is part of the so-called flux-across-surfaces theorem, of which many versions have 
been proven (see Remark 16.2). The above result is made rigorous in [7]. 

The formula (16.15) is basic to scattering theory. In scattering situations, the 
detectors are far away from the scattering centers, and after the scattered wave has 
left the scattering potential, i.e., when it is far away, it does move freely. Therefore 
(16.15) applies to scattering, with one caveat, however. The Fourier transform of 
the wave function y at time zero appears in (16.15). Of course, time zero could be 
any time, but which y is appropriate in formula (16.15) when the wave function 
also interacts with a potential, i.e., when it does not move freely all the time? One 
might have the idea of taking wr for some large T, a time whereafter the wave 
function does move freely. But such a time is not sharply defined. Mathematically at 
least, there will always be some part of the wave function overlapping the potential. 
The question then is: Does there exist an asymptotic expression for a given y for 
large times, with which we can compute large time statistics? We shall address this 
question next. In fact, we shall compute the exit distribution now in the physically 
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relevant situation of scattering. In doing so, we shall restrict ourselves to the simplest 
case of one-particle potential scattering. 


16.3 Scattering Theory and Exit Distribution 


We continue with our discussion of the exit distribution and consider its relevance 
for scattering situations. In scattering theory the wave does not evolve freely. There 
is a scattering potential V which influences the evolution of the wave, as shown in 
Fig. 16.1, i.e., the Schrddinger operator is H = Ho + V. In scattering theory, we pic- 
ture a wave approaching the scattering potential from far away, interacting, and then 
leaving as a “scattered wave”. Far away from the potential, where the detectors wait 
for the particle to arrive, the wave should move freely (assuming that the potential 
falls off fast enough). Such wave functions will be called scattering states. 

How can one phrase in mathematical terms the fact that a wave moves asymptot- 
ically freely? The easiest way to approach this is perhaps to think of asymptotics in 
time rather than spatial asymptotics, since these should be roughly equivalent points 
of view. The question is then: What is the asymptotic form of e~"” y for large t? 
Tracing the time evolution of the wave function ey as a path in Hilbert space 
helps to answer this (see Fig. 16.5). We can picture free motion asymptotically in 
time by the path approaching an asymptote which is defined by a freely moving 
wave function (e aM Vout) eR? sketched in Fig. 16.5 as a “straight line” in Hilbert 
space. This suggests making the requirement 


im lle ye" You =. (16.16) 


This is a requirement for the existence of the wave function Wour whose free evolu- 
tion defines the asymptote for the evolution of y. All wave functions y for which 
such an outgoing (and ingoing, see below) asymptote exists are called scattering 
states of the Hamiltonian H. 

Since Wour(t) evolves freely in time, we know of course that (16.12) holds for 
Wout, and (16.16) then suggests that 


x,t large 1 ix2 /2t~ x 
y(x,1) Gye Wout (=) (16.17) 


One may hope that the amount of time in which the scattering wave is influenced 
by the potential is in a reasonable sense limited, so that the replacement we made 
in (16.13) to compute the flux integral (16.14) can also be made with (16.17). Then 
from (16.15), replacing Y by Wout, we obtain the formula 


R large = 
PY(X.€ ER) = [oul Pak. (16.18) 
z 
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Fig. 16.5 Schematic representation of wave function evolution in Hilbert space. The time evolution 
of a scattering state Wo is shown at some arbitrary time, e.g., f = 0. Associated with it are two 
states Woin and Woour which evolve freely. The backwards-in-time evolved YWoj, is the asymptote 
for t — —ce (16.43) and the forward-in-time evolved YWoout is the asymptote for t — oo. See text for 
explanation 


Remark 16.2. On Scattering into Cones and the Flux Across Surfaces 

The formula (16.18) can be turned into a mathematically rigorous assertion, usu- 
ally referred to as the flux-across-surfaces theorem [7—11]. It holds for a large class 
of potentials. The expression (16.18) gives the probability that the particle exits 
through the surface piece Xp (i.e., gets detected by the detector which covers that 
solid angle). The explanation as to why the probability is given by the momentum 
distribution of Wour is that the particle gets scattered into the (spatial) cone Cy. In 
other words, because the particle moves more or less along a straight line — a ray, 
given by the direction of the momentum /ik in Cy which lies in that cone — it crosses 
the detector surface. This idea also underlies the so-called scattering-into-cones the- 
orem [12, 13], which asserts that the probability that the particle is in the cone Cy at 
time T is given by the probability that the momentum lies in the cone, provided T 
is large enough. In other words this probability is also given by the right-hand side 
of (16.18): 


tim f |y(x.t)?@x= | \Wou(k) Park. 
Foe ICs Cs 

In contrast, the flux-across-surfaces theorem connects directly with the crossing of 

the detector surface at random times, and its proof supports the picture of straight 

line trajectories. But the rigorous assertion that the Bohmian trajectories become 

straight lines requires a few more technicalities, and has been established in [11]. 
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The scattering-into-cones theorem, and in particular the flux-across-surfaces the- 
orem, have been put forward as fundamental assertions for many-particle scattering 
theory [14]. There is of course no problem with writing down the N-particle quan- 
tum flux 


I Sieces .Xv,t) =S(W iyi Xv t)Vy(x1,.-. .Xw,t)) ) 


but what does it mean? For many-particle scattering there is not just one random 
time at which the detector clicks, but rather N random times at which the N par- 
ticles arrive at the detectors. The N-particle quantum flux does not handle that. 
It has therefore been observed [15] that, in many-particle scattering, the quantum 
flux loses its meaning, and that the relevant crossing probabilities of the scattered 
particles through various detectors must be based on the Bohmian trajectories and 
the Boltzmann collision cylinder argument leading to (16.6). This argument can be 
straightforwardly generalized to the case of many particles. A many-particle scatter- 
ing version of (16.18) based on Bohmian trajectories has accordingly been advanced 
in [16]. a 


16.4 More on Abstract Scattering Theory 


Formula (16.16) leads directly to the definition of the operator 


W,. = s-limel’# e—40 | (16.19) 
t—co 
where s-lim,_... indicates the strong L?-limit 


t—00 


lim | (W. —eenitto) g|| =0, forall geL’. 
In other words, 


V = We Wout - (16.20) 


Figure 16.5 shows the maps W (where W_ will be introduced further below) at 
time t = 0, which is an arbitrary time. It will soon become clear that shifting time 
will only produce an irrelevant phase factor in the formulas relevant for us. That is 
why we omit the time index 0. 

We have cheated a bit in representing the situation in such an innocent-looking 
way. We really want to find Wour for given y, so that the true aim is the inverse 
operator W, : , which is defined on Ran(W,), the range of W,. On that, 


w,! = Wwe = s-lim elo eH : (16.21) 
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To know the domain of wl, i.e., to know Ran(W,) is a way of phrasing the so- 
called completeness problem, which we shall elaborate on below. In short, we do not 
know beforehand which wave functions are scattering states. In fact, this is exactly 
what we wish to find out. On the other hand, the operator W, acts naturally on L? 
(provided the limit exists), because all states evolve under the free dynamics into 
almost plane outgoing wave packets which, when evolved backwards in time with 
the full time evolution, become scattering states, provided the Hilbert space sketch 
of the asymptotic approach shown in Fig. 16.5 is correct. The latter should be the 
case if the potential falls off “fast enough” at infinity. 

The operator is called the wave operator, and it was one of the earliest inventions 
in scattering theory. In other accounts of scattering theory the wave operator W,. is 
often written as Q_, where the difference is mainly a sign convention, explained in 
footnote 3. In our notation, the + sign indicates that the time limit in the definition 
of W, is for large positive times. Later on we shall introduce W_, in which the limit 
towards large negative times is considered. 

How can one get a handle on the limit in the definition of the wave operator? With 
a simple trick, as old as the wave operator itself. Since the wave operator encodes 
the potential V, we need a condition on the potential which ensures the existence of 
the limit. So how can we make the potential visible in the wave operator W,? The 
answer is, by writing 


. : Td el e to 
Wi. = lim el # eT — 14 lim Oe) 


dt 
T-00 T+ JO dt 


/ or : 
= 1+ Jim ie” (H — Hy)e dr 
— oo J) 


T , 
= 1+ Jim ie? Ve“ Mod (16.22) 
ee JO 
This gives us now the opportunity to see the stationary phase method (15.44) at 
work. The task of establishing that the wave operator W, is well defined is now 
reduced by (16.22) to showing the existence of the integral on L?. Thus it is sufficient 
to establish that, on a dense set of wave functions, 


Jim [* |je"Ve™y|jar = tim | ||ve™yl|ar < Clu. 
oo J) To JO 


In view of (15.44), this is easy if we lay down some conditions which we choose 
here for the sake of simplicity, while preserving the spirit of what needs to be done. 
The dense set we choose is the set of wave functions with support in Fourier space 
bounded away from zero, let us say by a distance a, so that k € supp => k >a. 
Then we split the integrand, introducing the characteristic function 7, 


||Ve"# y|| = ||[V(x(x < at) +y4(x> at))e "Mo y| 
< ||Vx(x< ate y|| + ||Vx(x> ate "7% y|| : 


Mathematical Physics 


360 16 Bohmian Mechanics on Scattering Theory 


If V is bounded in the operator norm, we can pull ||V|| out of the first term, and what 
remains is integrable in time by virtue of the stationary phase argument (15.44). For 
the second term, we assume that |V(x)| ~ x~!~€, so that 


\|Vx(x > athe" yl] < ||Vx(x > at)|| MII 


pol-e 


2 
which is also integrable in time. 


Remark 16.3. On the Scattering Program in Mathematical Physics 

Since we are talking about the behavior of wave functions, let us take the oppor- 
tunity of talking about a classification of wave functions which is of interest in the 
mathematical physics of scattering theory. Given a Schrédinger operator there are 
three different spectral subspaces of the Hilbert space, as discussed at the end of 
Chap. 15. There is the subspace jp belonging to the pure point spectrum spanned 
by eigenfunctions. These are stationary or bound states, and they do not move to 
infinity, whence the particles also remain in finite regions. 

Then there is the subspace “ont belonging to the continuous spectrum. Can 
the wave functions in that subspace also be dynamically characterized? Now, the 
continuous spectrum splits into two spectra, the absolutely continuous spectrum and 
the singular one. Correspondingly, one has two more subspaces Hc. and Hc. What 
can be said about these spaces? Writing 

ellt+s)H @—i(t+s)Ho = eH gitH .—itHo e isHo 


we obtain the so called intertwining property 
ew, = W eto | (16.23) 


from which we infer that the range Ran(W,,.) is invariant under the H-time evolution. 
Strong differentiation with respect to s of (16.23) yields 


HW, = WH, (16.24) 
and thus, since 
w,!=w 
on Ran(W..), we have 
W*HW, =H. (16.25) 


This says that the restriction of H to Ran(W,.) is unitarily equivalent to Ho. And 
this in turn means that the restriction has absolutely continuous spectrum (see Re- 
mark 15.12), so that we get the important “‘a priori” result 


Ran(W,) C Hc - (16.26) 
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Completeness of the scattering problem now means that Ran(W,) = Hc, and the 
next task is to show that. For the singular part of the spectrum, one wants to show 
asymptotic completeness, namely that it is an empty set, so that one does not need 
to worry about what those wave functions do. In other words, Mont = Hac (see for 
instance [4, 17, 18]). | 


We may now invert (16.20), i.e., we write 
Vou =Wi' wy, (16.27) 


and introducing this into (16.18), we can also express the exit distribution as 
R large & ae ie 2 
PY(X.€ ZR) [ | Gour(k)|°d?k = i. |Wo!w(k)| ak. (16.28) 
= >» 


This is the formula from which the scattering cross-section, the basic empirical im- 
port of scattering theory, arises. Note also that changing the origin of time produces a 
phase factor eit /2 in the Fourier transform, because Wout evolves with the free time 
evolution or because of the intertwining property (16.24). This factor then drops out. 

Before we compute the scattering cross-section, we shall allow ourselves a short 
interlude at this point to introduce another technically important notion, namely the 
notion of generalized eigenfunctions. They can also be used to phrase the long time 
asymptotics of the wave function, but they do much more than that: they diagonalize 
the Hamiltonian on the subspace Hac. 


16.5 Generalized Eigenfunctions 


Another important and somewhat less abstract notion, but equivalent to the idea of 
wave operators, is the notion of generalized eigenfunctions. They can be introduced 
in a straightforward manner, but since we have already talked about the wave oper- 
ator, we shall introduce them using the wave operator. Let us get more familiar with 
the wave function in (16.28): 


pee 


Wour(k) = Wo! w(k) = (k|We ly) , (16.29) 


By unitarity of the wave operator w,! = Ws on Ran(W,), this may be written as 


_ 


Wy! w(k) = (Wkly) . (16.30) 
Recalling that 
15 ke 
—_A ik-x _ *  ,ik-x 
aa 8 
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which means that 
k2 
Holk) = 5k) 
and in view of (16.24), we see that the vector 
|-+,k) := W,|k) (16.31) 
is an eigenvector of H, albeit a generalized one: 


k2 


k2 
H|-+,k) = HW,|k) = WHolk) = —Ws1k) = +k) (16.32) 


The “x representation” yields the generalized eigenfunction 
.(x,k) := (22)7/?W, |k) = (22)7/? (x|+,k) , (16.33) 


which is not square integrable, and hence not an element of L*. But this is in no way 
disquieting since we are already familiar with 


(x|k) = (2m)3/ei , 


the generalized eigenfunctions of Ho, which are normalized to (k|k’) = 6(k—k’). 
This is exactly how we should think of the generalized eigenfunctions @,(x,k) 
[likewise normalized to 6(k—k’)], namely as wave functions which diagonalize H 
restricted to Ran(W) in the very same sense as the Fourier transform diagonalizes 
Ho. The generalized eigenfunctions play the same role as the plane waves for the 
free Hamiltonian. They solve the eigenvalue equation 


2 
Ho(x,k) = 5 9(%k) (16.34) 


in the space of bounded functions, and one can take this as the starting point for 
scattering theory. 

The problem of completeness is now expressible as the problem of establishing 
an isometry between L* and .%. via a generalized Fourier transform, taking as 
“basis” elements the generalized eigenfunctions. This is completely analogous to 
the way plane waves define the Fourier transformation. Combining (16.29), (16.30), 
(16.31), and (16.33), we see that the function Wesies which is the Fourier transform 
of Wout, is also the generalized Fourier transform of W € Hac: 


Gour(k) = (2m) -3/? / +(x,k)* y(x)dx. (16.35) 


[We have ignored here the mathematical detail that the Fourier transform is not 
an integral transformation in general, i.e., the integral is in general defined via an 
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approximation (see Chap. 14).] Like the ordinary Fourier transform, the generalized 
Fourier transform on L? is an isometry, represented here by the wave operators. 
The eigenfunction expansion (16.35) is a powerful tool for studying the time 
evolution of the wave function, which, for the free Hamiltonian, is of course 
well known. One can get a good handle on the generalized eigenfunctions when 
one rewrites the eigenvalue equation (16.34) in integral form. This is called the 
Lippmann—Schwinger equation. We start with the eigenvalue equation (16.32), 


H|+,k) = (Ao +V)|+,k) = —|+,k) , 


and reorder the terms to get 
k2 
(Ho- 5) Fesk) = V+) 


We want to solve this for |+,k), but the operator (Ho — k”/2) is not invertible, since 


Observing this, we may write the inversion formally in a suggestive way as 


= (Ho) vik) +i), 


with the idea in mind that the generalized eigenfunctions should be asymptotically, 
i.e., far from the range of the potential, like plane waves, and where (Ho — k?/2)~! 
must be defined in an appropriate way, as spelt out below. In other words, we look for 
solutions of the eigenvalue equation (16.34) which satisfy the boundary condition 


lim [g1(x,k) —e**] =0. 


|x| 


This is the formal Lippmann—Schwinger equation, but we still need to find the 
proper Green’s function, i.e., the kernel of (Hy —k?/2)!. 

There are various ways to arrive at the Green’s function. Since we introduced the 
eigenfunction as kernel function of the wave operator in (16.33), the straightforward 
way starts from a variant of (16.22). By virtue of (16.33), 


|k) = W|+,k) ) 
and in view of (16.22), we write 
ie : 
Ws =1— lim i ilo ye-itH gy , (16.36) 
—oo J) 


Combining these, we compute 
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oe se 
Ik) = Wi 1+) =[+5k) — Jim ielovet |, What 
00 J) 


DP ; 
= |+,k) — lim jel Moth? 214k) dt 
00 J) 


T , 
= |+,k) — lim ie (Ho /2) V4 de , 
oo J) 


using the intertwining property e~””|4,k) = ent? /2 |+-,k) in the third equality. The 
integral is again the undefined inverse of Hp — a /2, but we now take the Abel limit, 
which exists: 


|k) = |+,k) -lim [ iexp i (Ho - = +ie)| V|+,k)de . 
€|0 Jo 2 


But we have already computed this! The integral is the resolvent 


ke 7 1 Rk = 


which we discussed in Remark 15.10. Substituting t/2 for t yields a factor of 2 
in front of the integral, and then we put Kk = —k? + 2ie and lim, jo K = ik in Re- 
mark 15.10, in accordance with our agreement to take the positive real part. Hence 
in the “x representation” this becomes 


1 e7ilx-ylk 


(x|k) = (x|+,k) 4 Viy){ylt+.k)d’y, 


an} [x-y| 

where the integrand kernel is the Green’s function of the Schrédinger equation in 
three dimensions. Using (16.33) and reordering yields the Lippmann—Schwinger 
equation 

kx _ 1 e7ilx-ylk 


k) =elkx_ J & 
P+(x, ) e on jx—y| 


V(y)ox(y,k)d?y. (16.37) 
The equation provides a handle on @;(x,k), for example by using iterative proce- 
dures. One may start at zeroth order with e** replacing @,.(x,k) inside the integral. 
Iterating this gives the so-called Born series. 

In view of the decay of the Green’s function, (16.37) seems to suggest that, far 
from the scattering potential, the generalized eigenfunctions become ordinary plane 
waves. However, the decay of the Green’s function is not strong enough to support 
the assertion that this also holds in the L?-sense. For that we must appeal to the 
stationary phase argument of Remark 15.9, which in fact yields a much stronger 
decay in space and time. Expanding y in terms of generalized eigenfunctions [see 
(16.35)], we get 
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: 2 oan 
(x,t) = (2n)73/? / e 1/2 o. (x, k) Gout(k) dk (16.38) 


= (2n)-9/? feeb. (Ie) Bk 


an)-3/2 f ik t/2 1 ae diy Gat ae 
—(2n) Je an | Ix—y) (y)@+(y,k)d°y} Wour(k) d°k . 
The first term contains the phase term of the free evolution, which yields, by the 
stationary phase argument, in accordance with (15.10), the new general asymptotics 
(16.17). The second term contains the phase function —|x — y|k — k?t/2, which has 
no Stationary point for positive times t, and hence by the stationary phase argument 
in Remark 15.9, the term shows arbitrary polynomial decay in time and space, pro- 
vided Wout is smooth enough. Hence the long time and long distance behavior for 
positive times is governed by the first term, the free evolution. This is therefore 
another less abstract, and consequently more physical way to capture asymptotic 
free motion. There is no need to appeal to wave operators: simply use generalized 
eigenfunctions. 

For reasons to be explained below, we introduce another equally important 
“eigenbasis” which is given by another class of solutions @_(x,k) of the eigen- 
value equation (16.34). They appear when we agree to take the negative real part 
kK = —ik in the derivation of Remark 15.10. The difference is a change of sign in the 
exponent of the integrand in (16.37): 


1 eilx—ylk 


xk =e"? = | —__... _(y,k)d3y. 16. 
p-(x,k) =e" — 5 cy (y,k)d°y (16.39) 


These functions define a generalized Fourier transform as well. But instead of 
(16.35), we now define 


Yin (k) := (20)~3/? / o_(x,k)* y(x)dx. (16.40) 


To understand the subscript “in”, we reconsider the stationary phase argument we 
just went through in (16.38) for this class of eigenfunctions, and obtain analogously 


(xt) = (2) 9 fe! (x,k) Pin(k) dk (16.41) 


= (2n)9/? fe H/o Gin (k) Bk 


—(2n)-98 few | Lf yiyyg_(y. kay] inl) aPk 
nN |x—y| — ’ in : 


The sign makes all the difference. The phase in the second term has no stationary 
point for negative times, so it vanishes when t — —co. The first term thus dominates 
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the large negative time behavior and shows that, for large negative times, the wave 
function approaches another asymptotic free evolution, that of Yin. 

This leads us to the “incoming side” of the scattering process, which has not yet 
been discussed. It is also important, perhaps even more important than the outgoing 
side, because this is after all the part of the scattering experiment which is supposed 
to be under the control of the experimenter. The experimenter usually prepares the 
incoming wave packet with great care. We shall say more on the nature of the in- 
coming wave in the next section. At the moment it suffices to understand how one 
describes mathematically the fact that the incoming wave is prepared, i.e., that it is 
under the control of the experimenter. This means that at the time of preparation, 
the state is not yet influenced by the scattering potential. 

Physically, the preparation is done far enough from the scattering center to ensure 
that, at the preparation place, the scattering potential is not felt, i.e., ce!  e~%0, 
In other words, the wave function starts to evolve with the free time evolution. This 
is of course to be taken with a pinch of salt. The Schrédinger evolution will immedi- 
ately produce a spread-out wave function, so — mathematically — the potential is al- 
ways felt, but its influence is “physically negligible” at the time and place where the 
wave packet is prepared. The idea is then that, at the time and place of preparation of 
the scattering state y, the state is close (in the L* sense) to Win, i.e., || W— Yin|| © 0. 
We shall come back to this below. 

Obviously, since the generalized eigenfunctions diagonalize H, they provide a 
powerful analytical tool to come to grips with the time evolution of wave functions 
for a general H. Curiously though, the generalized Fourier transform has become 
something of a wallflower in mathematical scattering theory. It has been revitalized 
in recent works on what is called time-dependent scattering theory [9, 10, 19-21], 
and what we present in this chapter is based on these references. 

We can restate the above in terms of a wave operator W_ which, in accordance 
with (16.31) and (16.33), is related to @_ according to? 


o_(x,k) =: (22)°/?(x|W_k) =: (2)?/?(x|—,k) . (16.42) 


The definition of W_ goes like the definition of W,, but with time reversed. Intro- 
duce an asymptotic wave function, namely Yin, evolving with the free Hamiltonian, 
and the scattering state y converges under its evolution for t — —ce (see Fig. 16.5) 
to that freely evolving state. This reads in mathematical terms as 


3 The sign convention which introduces Q_ instead of W, arises from the minus sign in the ex- 
ponent of the integrand. Accordingly, W_ is denoted by (2,. One may wonder about the physi- 
cal meaning of the different signs in the exponent e*!*¥lk /|x — y| appearing in the Lippmann— 
Schwinger equations. They represent outgoing and incoming spherical waves. Their appearance is 
directly related to the way the wave operators W are defined. For example, in W, [see (16.19)], 
the full time evolution acts backwards in time, developing the scattered state y ~ Wou (for 
t — co) backwards in time. Therefore from the scattering picture of outgoing spherical waves (see 
Fig. 16.6), the time-backwards evolved spherical waves must appear in the representation of W,, 
whence (+ appears. By the same token the scattering picture of outgoing spherical waves will be 
captured by @_. Equivalently, we can understand the roles of @ by virtue of the stationary phase 
arguments in (16.38) and (16.41). 
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Jim |e" y —e" yin|] = 0, (16.43) 
yielding the wave operator 
W_ = s-lim eH itty | (16.44) 
Put another way, 
y= W_Vin. (16.45) 


Inserting this into (16.29) and recalling (16.33), we get 
(k|Wy! w) = (KIWEW_ Win) =: (k|S Yin) = Your(K) - (16.46) 


The operator combination W/W_ =: S is called the S-matrix. It maps (unitarily) Win 
tO Wout? Wout = SVWin. Introducing this into (16.28), we obtain 


R large 
PY(Xe€ Za) Sf |(kiSyin) Pak (16.47) 
2. 


At every moment of time t, the map W—!' maps the scattering state y; to the corre- 
sponding Y%in (see Fig. 16.5). 

In (16.45) above, y can be the scattering state at any arbitrary time and Win 
the corresponding in state at that time. For the physical interpretation, however, it 
is useful to observe that the right-hand side of (16.47) does not change in states 
taken at different times, since the S-matrix commutes with the free time evolution 
Se-i*40 — e~it40 5, This holds because the intertwining property (16.24) holds for W_ 
as well, i.e., e-4W_ = W_e~ ‘0, as one readily sees by repeating the argument for 
W_ in place of W,. Applying the corresponding intertwining properties twice then 
yields the commutation property of S. By the intertwining property, we have YW in = 
W*e 4y = e-4oy,,, and therefore the replacement of Win by Yj,in produces a 
phase factor eo ik / 2, which arises by evolving the plane wave |k). This phase factor 
drops out, since one takes the absolute square in (16.47). 

As already mentioned above, for the physical interpretation one should think of 
Win as taken at a time where |/w— YWq|| ~ 0, so that we may think of Win as the 
scattering state y prior to the time of interaction with the scattering potential. In 
that case, we have approximately 


Rl. 
PY(X,€ Zp) = [ |(k|Sy)|"d3k , (16.48) 
Cy 


with y being the scattering state prior to the interaction with the potential. We shall 
come back to this in the last section. 
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16.6 Towards the Scattering Cross-Section 


In a scattering experiment, one focuses on directions k different from the incoming 
directions k’ which we assume to be well localized around ko. This helps to simplify 
formulas, because we can subtract the identity from S, and this is a subtraction of 
the unscattered part. The resulting operator is called the T-matrix. The following 
computation is for the main part already known to us. We compute 7 := S—|, ie., 


(k|T Yin) — (k|(S— 1) Win) = — (k|( WW i !) Win) 
= /cklowew- —I)|k’) (k’| yin) dk 


= J (xlowsw_ = )|K’) Vin(k APR’ , (16.49) 
and we determine the kernel 
(k|T|k’) = (k|(W{W_—I|k’) . (16.50) 
Since | = W*W_, we obtain 
(k| (W{W_ —W*W_)|k’) = (k|(Wt —W*)W_|k’) , 


which is excellent because, in view of (16.36) and its analogue for W*, we have 
Wi -Wwt=- / ie” Ve “MH dy (16.51) 


Everything now fits beautifully together: 


(k|7|k’) = (x| - f° ie e— "4 dt W_ 


k’) =-f (k|ie“0Ve-#” W_|k’)dr 
= -f- (klie"9VW_ e "M0 |k’) dr =-{[- (klie™ HP 2K /2) yy yy |k’) de 


co ke _ Ke? 
i exp ee (k|iV|—,k’)dr , 


where we have introduced (16.42). Now use the fact that 


co ; (k? =i") _ k2 Ki? 
[. exp ‘eo dt = 276 aw (16.52) 
to get for (16.49) 
(K|T Win) = — [208 & _ (k|iV|—,k’) Yin(k’) aK’ (16.53) 
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Performing the integration in (16.53) in spherical coordinates (k’, @’), with appro- 
priate substitution 


72 


k 
Be =k-dk' Po! =Kd . a’, 


yields k = k’, and we are left with the integration over the solid angle w’. Writing 
k = (k’,@), we get for (16.53) 


((K,@)|T Win) = 2ni f k((K',0)|V|-, (k’,00')) Pin(k’,@')d?@'. (16.54) 


The potential term in the integrand ((k’,@)|V|—,(k’,@’)) is customarily written as 
the kernel function 


((k’,@)|V|—, (k’,@')) =: T((K,@),k’) = 


i e HOV (x) Q_(x,k)dx, 
(16.55) 


where we have introduced (16.42). 

We can now insert this into (16.47) by restricting the integration to cones Cy 
bounded away from the incoming direction (which in a scattering experiment is 
rather sharply defined), so that we may replace S by T = S — |. Collecting all terms, 
we obtain under this condition, for large R, 


co 2 
PY (X_ € Ep) wan? | [ [(K0), (0) Fin((K,0")) Po! K@adk. 
0 JCs 


(16.56) 


Our final goal is to deal with the realistic scattering situation, where the (normal- 
ized*) incoming wave packet has a well defined incoming momentum and the lateral 
(transverse to the incoming momentum) shape does not vary within the range of the 
potential. In other words, on the length scale of the target, the incoming wave packet 
looks almost like a plane wave e'*0*, In this situation, the exit distribution yields the 
basic object of scattering theory, the scattering cross-section. Turning back now to 
physics, we return also to physical units. 


16.7 The Scattering Cross-Section 


Scattering theory is a cornerstone for verifying our understanding of the universe ex- 
perimentally. The central quantity in a scattering experiment is the scattering cross- 
section Ox, (2). The cross-section is an area which has the following meaning. One 


4 Needless to say, physics demands that the wave packet be normalized, even when it becomes 
almost a plane wave! A formal way to think of the incoming wave is that | |?(k) ~ 5(k — ko). 
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prepares a beam of “identical” particles, whose wave packets are almost like plane 
waves with respect to the size of the target (see Fig. 16.6). This means that each 
wave packet is very sharply peaked around the wave vector ko with lateral wave 
number, 1.e., perpendicular to Ko, close to zero. We shall be more specific about 
what this means in physical units later on, but the picture in physical space should 
be clear: each wave packet has almost flat wave fronts widely overlapping the target. 

Now place a surface perpendicular to Kg, centered on the beam in front of the 
target. Then count the number of particles crossing that surface per unit time. The 
scattering cross-section Ox, (2) is the size of the surface for which the number of 
particles crossing per unit time equals the number Ns of particles scattered per unit 
time into directions in Y. By the law of large numbers, the number Ns is determined 
by the crossing probability (16.56), so when Ng is the number of particles per unit 
time within the beam (i.e., the number of particles crossing the cross-section of the 
beam per unit time), then 


Ny = NgPY (Xe € Ep) - (16.57) 


We must now evaluate PY¥(X, € Xp) under the conditions typically satisfied in scat- 
tering situations, to obtain the theoretical prediction for the cross-section Ox, (). 

Before doing this, in order to obtain a more complete perspective, we recall the 
famous derivation of the scattering cross-section by Born. In the following, we think 
of the origin as the place where the target is put. The distance R is the distance of 
the detectors from the origin. R is assumed to be very large compared to the target 
size. 


16.7.1 Born’s Formula 


In his famous 1926 papers [1, 2], Born derived a theoretical expression for the scat- 
tering cross-section using a stationary picture. The derivation starts with the ansatz 
of an idealized wave picture, where a plane wave and an outgoing spherical wave 
with scattering amplitude f are superposed: 


. elkox 
@(x) = elkox + fig (@)— (16.58) 
with 
is (16.59) 
x 
From this one computes the quantum flux 
: ae 
5°(x,1) = 3 (@"(x,1)Vo(x,1)) (16.60) 
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arising from the plane wave part e'0’*, yielding ky /m, and the quantum flux arising 
from the spherical wave part fx, (@)e" /x, yielding 


The first flux is interpreted as the flux of an incoming beam of particles with mo- 
mentum /iko, and the second flux as the flux of the scattered particles through a 
detector surface covering the solid angle 2. The quotient of the modulus of these 
fluxes is taken to be the cross-section for the solid angle 2: 


ko 
| fic (@) |? = 
o(2) = | “~~ —B ro = [| figlo) Peo. (16.61) 


To determine | fx,(@)|? in a physical situation where particles are scattered off 
a potential V(x), i.e., where the wave function is governed by the one-particle 
Schrédinger equation with Schrédinger operator 


h 
Pa hay, 
2m 


one compares (16.58) with the generalized eigenfunctions @:(x,k) solving the 
Lippmann-Schwinger equations (16.37) and (16.39): 


1 eFik|x—xp| 


.(x,k) = e* (xp) (xp,k)d°x' . 


Vv 
2a J |\x—xp| 
This can be used to read off the asymptotic (large x) behavior of the eigenfunctions. 


The bit of computation required for this involves estimating the absolute value 


x/2 


|x —xp|=x4/14 2 


x 
20: Pemex @-xp, 
x 


for large x, where we have introduced (16.59). Neglecting terms of order 1/x?, one 
readily sees that @_(x,k) is the stationary solution which is asymptotically of the 
form (16.58). This is of course no surprise, because we already anticipated that in 
footnote 3. We have 

o(xk) awe** = —_ / e OXY (x!)@_ (x! k)d x’. (16.62) 
Comparing this asymptotic form of @_(x,k) for large x with (16.58) suggests that 


Fig (@) ~ _ / e ko XV (x) @_(x,ky)d?x. 


Therefore, 
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2 


1 . 
P if eV (x) g_(x, ko) Px]. (16.63) 


~ 472 


| fico () 


Recalling the T-matrix kernel (16.55) and appropriately renaming the variables, we 
find by comparison that we may also express the scattering amplitude f in the form 


| fixg (@)| = 427|T ((ko, @), Ko) |. (16.64) 


Equation (16.61) then becomes 
Ox, (Z) = 16n | |Z ((ko,@),Ko) ?d2o . (16.65) 
z 


The expression looks familiar! It is very similar to the right-hand side of the crossing 
probability (16.56), as indeed it should be, and yet it is not the same. The difference 
is that, in (16.56), the T-kernel is inside the squared d?@’ integration and integrated 
against Vin ((k, a')). We now come to the evaluation of (16.56) yielding (16.65). 


16.7.2 Time-Dependent Scattering 


The explanation also answers a question which has been around since Born’s deriva- 
tion: Why is Born’s stationary analysis appropriate? Scattering is a time-dependent 
physical process with normalized wave packets that move. Therefore a fundamental 
analysis must start with a normalized wave packet satisfying the conditions which 
a scattering experiment provides. From that we must calculate, under conditions 
appropriate for a scattering experiment, the probability for scattering. The general 
idea is of course that, in the scattering situation, the moving wave packet will look 
on the incoming side of the target almost like a plane wave packet. Here the word 
“almost” has to be taken with a pinch of salt, since the packet will always be square 
integrable. On the outgoing side it will almost look like a spherical wave, and a 
quasi-stationary picture emerges (see Fig. 16.6). 

We can see this using the expansion (16.41) and introducing the approximation 
(16.62). With the identification of the scattering amplitude f, we have approximately 


2 . 
w (x,t) © (2n)-9? | exp (i) elKX Gi, (k) ek 


2 ikx 
+ (20)? [ exp (i) — fie(®) Bink) ak. 
The phase in the first term has a stationary point at ik/m = x/t. For an incoming 
wave (YW © Yin) with a sharply defined momentum fKo, the stationary point is ap- 
proximately x/t ~ ko, so that this terms contributes to the wave packet moving in 
the direction Ko. In other words, it concerns the unscattered part of the wave. The 
scattered part is to be computed from the second term. For example, one may use 
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Fig. 16.6 Schematic representation of a scattering experiment. Also shown at the bottom is the 
support of the Fourier transform of the wave function y, which is a member of a beam of identical 
wave functions impinging on the target. See the text for further explanation 


it to compute the probability of scattering into the cone Cy (see Remark 16.2). But 
since we already have the the crucial formula (16.56) for the crossing probability, 
which directly connects with the clicks in the detector, we shall now go on with that 
formula. 

As we did with (16.48), we may replace Yin in (16.56) by the incoming scattering 
state y prior to a time where the interaction with the scattering potential becomes 
effective. We shall refer to the spatial location of y at that time by saying “in front 
of the target” (see Fig. 16.6). We assume that the wave packet y moves towards the 
origin (where the target sits) with a well defined momentum /iko such that k- ko > 0, 
for all k € supp ¥. We assume further that ko is parallel to the e3 direction. To 
understand that the steps we take in the following are mathematically reasonable, 
it is perhaps best to think of v as a Gaussian function. We shall formulate further 
conditions on y along the way, in particular, conditions which are typically fulfilled 
in scattering situations and which we shall discuss at the end. 

Replacing Win by w and introducing (16.64), equation (16.56) becomes 


PY(Xe Ee) * os ff 


We assume that the scattering amplitude varies only slowly as a function of k on the 
support of (see Fig. 16.6): 


2 
/ fix) (@)Y((k,@"))d?@"| Adkd?@ 


fx(@) © fxy(@), onsuppy. (16.66) 


If this condition is fulfilled, the scattering amplitude may be pulled out of the inte- 
gral, whence 


1 a . 2 
PY (Ke € Bn) © ff lfin(@) Pom f Go k’dk. (16.67) 
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Comparison with (16.61) suggests that we are nearly done. We need only understand 
what the factor 
eh 
(2x)? 


is doing there. (We have renamed the integration variable w’ by @.) For this purpose, 
introduce (k;,k2) as new variables for the solid angle variables (%, @). A short cal- 
culation shows that 


,@)) do of k*dk (16.68) 


rPPo= BE) aks ake where k3(k1,ko) =1/k2-2—K5. 
This yields 
P((k,@)) Pao : = _ [Pb hota) ah ° 
(16.69) 
We now assume that 
lki|,|kol<k, ie, kek, (16.70) 
and that 
W(ki ko, k3) © W(ki,k2,k) . (16.71) 


Under these conditions and using (16.69), equation (16.68) becomes 
: ah, fw ((k,@)) kd? da 
(2n)2 (27)? 


Denoting the partial (inverse) Fourier transform by 


y( (ki, ko, k) dk dkg 


(16.72) 


w(0, 0, k) atl y( (ki, ko, k) )dky dk ‘ 
we have 


om ah W(kr,ko,Ridkidka) dk = [00.0.4 Pak. 


Observing that k3 > 0 on the support of y and using the Plancherel identity, we 
finally obtain 


aE PY fw ((k,o)) Pao of a= J |v.0,xs) ars = [|(o.0.x3) Pass, 
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since W(x1,x2,x3) =0 for x3 > 0. 

What is the meaning of the quantity on the right? Let A be an area perpendicular 
to ko in front of the target, much larger than the target area but smaller than the 
cross-section of the beam (see Fig. 16.6), on which the wave function satisfies 


| v(x1,x2,x3)| © |W(0,0,x3)|, (1,92) EA, (16.73) 


i.e., on which the lateral shape of the wave function is flat. Introduce 


1 0 2 e 2 
pe wl, |y(x122,x3)| dxdiodes ~ | |w(0,0,x3)|'dx3. (16.74) 


pa|A| is the probability that the particle is in the cylinder A x (—°,0). Assuming that 
the wave packet does not spread too much laterally during the time the wave packet 
crosses the surface A, the particle will cross A with this probability, and hence we 
arrive at the meaning of pag as the probability per unit area that the incoming particle 
crosses the surface A. 

Introducing p, into (16.67), we get 


PY (X. € Er) & pa | fig (@) |? (16.75) 


Now recall (16.57). We multiply the probability density pa accordingly by Ng, the 
number of particles per unit time in the beam, and get (by the law of large numbers) 
the number of particles per unit area per unit time crossing A, i.e., N4/|A| © Ngpa. 
The scattering cross-section is by definition the area we need to multiply N4/|A| by 
to obtain Ny. We thus identify the scattering cross-section theoretically as Born’s 
expression (16.61). 

The remaining question is whether the conditions we employed along the way 
are consistent with the experimental scattering situation. The most demanding con- 
dition is perhaps (16.66). What does this mean in spatial terms? To see that, we 
approximate the generalized eigenfunction in (16.63) by a plane wave. The scatter- 
ing amplitude then becomes, in this first order approximation, the so-called Born 
approximation, the Fourier transform of the potential, also referred to as the form 
factor: 


fic(@) = V(k— ko) . (16.76) 


For a rough understanding, let us use the reciprocal relation known from Gaussian 


shapes of the width A*x of a function F in space and A’ k of the Fourier transform 
F, namely 


APx~ (APk)-!. 


In terms of the form factor condition, (16.66) says that 


V(k—ko) ~V(ko), for ke supp/, 
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whence V must vary on a much larger scale than W, i.e., Avk >> Ak. This means 
for the widths in space that Avx < AYx. In other words, the wave must be very 
much spread out compared to the spatial extent of the potential. 

Now we come to conditions (16.70) and (16.71), of which (16.71) is the more 
demanding. As already mentioned at the beginning, in order to understand what 
is involved, we should think of as having a Gaussian shape (in particular it is 
positive), e.g., 


(kit+ko)? — (k3—ko)? 


207 207 


W(k1,ko,k3) ~ exp 


Then (16.71) means roughly that, for k1,k2,k3 varying within the widths o, , Ol 
| (kiko, k3) — W(ki,k2,k)| < Wk, ko, ks) , 
which leads to 
| Oi, W(k1,.k2,k3)(k— kz) | < Wlki, kz, ka) , 


that is, 


Observing that 


2 


lki+kKh oo 
k-kae =1/P4+Re4t+ReHhks 1 De any a 
3 = 4/ Kp + ky + kz — 3 2 ks he 


one then has 


2 2 2 
k3—ko 0] Ol Of OT 
of ko of ko Oko 


We introduce the positional spreads, i.e., the longitudinal spread Oj = oy! and the 


lateral spread of = 0 |, whereupon 


oO 
ee 1s (16.77) 


This is so simple that one must wonder what it means. The answer is quite remark- 
able, as this condition is equivalent to another condition, which we mentioned in 
passing, to get the interpretation below (16.73). The point is that the wave function 
should not spread too much in the lateral direction when passing through the surface 
A. Multiplying the numerator and denominator of (16.77) by f/m and observing that 
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om 
hky 


is the time it takes for the wave function to cross the surface A, we set A =ho T/m 
for the lateral spread during that time, and we see that (16.77) amounts to the simple 
relation: 


— <i, 


which is the no-spreading condition. 

This insight also helps with another question one might raise, since there is a 
certain arbitrariness in the notion “in front of the target”. Where exactly should the 
surface A be put? As long as the longitudinal spread of the wave function is large 
compared to the distance at which A is put in front of the target, the no-spreading 
condition ensures that the exact place where A is put does not matter. 

In textbook treatments of scattering, another, much stronger no-spreading condi- 
tion is often required, namely that the wave does not spread from the time of prepa- 
ration to the time of detection (see, e.g., [22]). Its purpose is merely to simplify the 
mathematical argument, although this argument is simple enough without it, as can 
be seen from the above. However, one cannot do without the weaker condition we 
have employed here. 

In real scattering experiments, the condition (16.73) is satisfied approximately for 
A extending over the cross-section of the beam, i.e., the decay of the wave function 
at the edges of the beam is concentrated in a very small region compared to the 
cross-section of the beam. In this case p4 ~ 1/|A| [see (16.74)]. When the cross- 
section of the beam is very much larger than the extension of the scatterer potential, 
one may worry that only a very tiny fraction of the particles are being scattered. 
Therefore realistic scattering experiments like Rutherford scattering are done with 
an extended target foil, on which many scatterers are randomly distributed [23]. The 
incoming wave then overlaps a great many small scatterers. 

Under certain conditions, which ensure no multiple scattering and no coherence 
effects (unlike Bragg scattering, where coherence is the key feature), the scatter- 
ers contribute independently to the numbers Ny of scattered particles per unit time. 
Therefore, if the incoming wave overlaps ng scatterers, the relevant probability den- 
sity which Ng multiplies is no longer pa, but 

po NADA & ial =p 
the density of scatterers on the target. In this case, the formula for the scattering 
cross-section thus reads 
O; Nz 
= Nap | 


Mathematical Physics 


378 16 Bohmian Mechanics on Scattering Theory 


A mathematically rigorous treatment of a scattering situation along the lines dis- 
cussed in this chapter for a beam of random wave functions, which also intuitively 
models the random scatterer situation, can be found in [24], which includes many 
references on this subject. 
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Finally: It was stated at the outset, that this system would not be here, and at once, perfected. 
You cannot but plainly see that I have kept my word. But I now leave my cetological System 
standing thus unfinished, even as the great Cathedral of Cologne was left, with the cranes 
still standing upon the top of the uncompleted tower. For small erections may be finished by 
their first architects; grand ones, true ones, ever leave the copestone to posterity. God keep 
me from ever completing anything. This whole book is but a draught — nay, but the draught 
of a draught. Oh, Time, Strength, Cash, and Patience! 

Melville (1851), Moby Dick, Chap. 32 [1] 


What is a Bohmian quantum theory? A quantum theory that spells out what it is 
about. In other words a theory with a clear (primitive) ontology. A theory which has 
no place for mysticism, paradoxes, and superstition, and in which observation does 
not play an irreducible fundamental role. This does not mean that observation has 
no effect, or only a slight effect on an observed system. It does not mean that at all, 
as Bohmian mechanics proves. This chapter only draws one moral and its role is 
to encourage young scientists not to give up reason and rationality in their quest to 
understand how the physical world functions. 

The moral is this. It has been claimed that quantum mechanics proves that a 
reasonable (some call that classical) understanding of the world is impossible, and 
that it is impossible to write down the laws of nature in terms of a clear ontology. 
Bohmian mechanics proves that such claims are false, even embarrassingly false, 
because Bohmian mechanics is an utterly straightforward completion of quantum 
mechanics. Indeed, it is quantum mechanics. 

A satisfactory Lorentz invariant Bohmian theory still needs to be found. The fact 
that it has not been found yet does not prove that it is impossible to do so. It may 
prove, if anything, that we need to think harder, and work harder, and that we need 
to scrutinize our established modes of thinking and remain open to reasonable ideas. 
Here, reasonable means simply something like this: change our idea about space- 
time, change the structure of Schrédinger’s equation, change the ontology, make 
it points, fields, strings, or make it whatever seems right for describing nature, but 
never give up ontology! 


D. Diirr, S. Teufel, Bohmian Mechanics, DOI 10.1007/978-3-540-89344-8_17, 379 
© Springer-Verlag Berlin Heidelberg 2009 
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wave function, 214, 217 Dynamical system, 88, 91 
Configuration, 12, 14, 214 
space, 4, 12, 14, 130, 137, 139, 145 Effective wave function, 180, 191, 216-221, 
of indistinguishable particles, 167, 168 223, 224, 227, 228, 232, 234, 239, 242, 
Conservation 244, 299 
of energy, 18 Ehrenfest theorem, 185 
of phase space volume, 19 Einstein relation, 112 
Constant of motion, 19 Einstein, Albert, 27, 109 
Contextuality, 207, 245 Electrodynamics 
Continuity equation, 8, 21, 23, 32, 110, 140, Maxwell—Lorentz, 47, 241 
141, 151, 161, 162, 246, 351 Wheeler—Feynman, 34, 241 
Continuum, curse of the, 19 Electromagnetic current, 31 
Convolution, 240, 263 Electromagnetic field, 26, 38, 158, 241 
Core of an operator, 288 Electromagnetism, 26, 49 
Coulomb potential, 294 Maxwell—Lorentz, 30, 32 
Cross-section, see Empirical 
Scattering cross-section density, 82, 87, 94, 110, 135 
Crossing probability, 345, 347, 348, 351, 354 distribution, 8, 61, 82, 104, 105, 153, 154, 
asymptotic form of, 353, 355 156, 211, 218, 220, 222-224 
for scattering situations, 356, 361, 369 mean, 54, 99, 104 
four-dimensional view, 351 Energy—momentum relation, 126 
in general, 349-351 Ensemble, 63 
positivity condition, 348 (grand) canonical, 224 
Cyclic vector, 322 canonical, 65, 66, 79 
Cylinder set, 102, 118 microcanonical, 67, 79 
of subsystems, 211, 219 
de Broglie, Louis, 127 single time, 222 
Decoherence, 5, 178, 180-182 time, 222 
Deficiency indices, 293 Entanglement, 4, 131, 137, 201 
Democritus, 50 Entropy 
Density matrix, 191-198 as a function on phase space, 83, 85 
reduced, 195 Boltzmann, 76, 80, 83 
time evolution of, see von Neumann Clausius, 76 
equation extensive nature, 84 
time evolution of reduced, see Collapse Gibbs, 78, 80 
equations thermodynamical, 82 
Diffusion, 109 EPR Gedanken experiment, 202 
Dirac EPRB, Bohm’s version of EPR, 202 
equation, 161 Equilibrium, 151-153 
formalism, 192-193, 236, 239 and irreversibility, 151 
notation, 236, 238, 239 Equipartition theorem, 72 
Dirac, Paul, 129 Equivalence class, 253 
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Equivalence of ensembles, 79 
Equivariance, 152, 184, 212, 220, 279, 305, 
348, 350, 352 
Equivariant measure, 211 
Ergodic theory, 89 
Ergodicity, 90, 94, 95 
definition of, 91 
Euler-Lagrange equation, 24 
Excited state, 131 
Existence of dynamics 
Bohmian mechanics, 153 
Hamiltonian mechanics, 17 
Newtonian mechanics, 13 
Exit statistics, 346-353 
Expectation of random variable, see Random 
variable, expectation of 
Experiment, 228-245 
general structure of, 240 


quantum equilibrium statistics of, 233, 234 


Fapp, 5 
Fermat’s extremal principle, 24 
Fermion, 166 
Feynman, Richard, 90 
Feynman-Kac integration, 119 
First-exit position, 346, 351 
First-exit time, 346, 347, 351 
distribution function of, 347 
probability density of, 348 
Flow, 14, 20, 21 
Hamiltonian, 17 
incrompressible, 19 
Flow lines, 17, 19 
Flux across surface, 354 
Flux-across-surfaces theorem, 355-357 
Fokker—Wheeler—Feynman action, 35 
Form factor, 375 
Four-current, 351 
as 3-form, 353 
Four-potential, 30 
Fourier transform, generalized, 362, 365 
Fourier transformation 
on L?, 258-259, 263-264 
on Y, 259-263 
on .Y’, 264-266 
Free asymptotics, 335 
Free particle, 288-289, 292 
on the half-line, 246-248, 281, 284, 
289-290, 292 
Free propagator, 334 
Functional calculus 
for bounded operators, 318 
for unbounded operators, 327 
Fundamental group, 168 
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Galilean 
boost, 27, 45, 46, 132, 147 
invariance, 45, 131, 148 


invariance of Bohmian Mechanics, 147-148 


symmetries, 45 
Galton board, 54, 64, 165 
Galton, Francis, 54 
Gauge symmetry, 44 
Generalized eigenfunction, 361, 362, 365 
asymptotic behavior, 371 
Generalized eigenvector, 362 
Generator, 282, 332 
Gibbs ensemble, 64, 65 
Gibbs, Willard, 64, 94 
Green’s function, 33, 338 
Ground state, 131, 153, 222 
Group velocity, 125 
GRW theory, 5, 49, 140, 179, 182 
Guiding equation, 8, 279 
Guiding field, 152 


H-theorem, 87 
Hamilton function, 16 
Hamilton operator, 183, 281 
self-adjointness of, 298 
Hamilton, William Rowan, 16 
Hamilton—Jacobi formulation, 24, 26, 47, 145, 
148 
Hamilton—Jacobi function, 25, 190 
Hamiltonian, 130 
Schrodinger, 146, 197, 248 
system, 17 
Hamiltonian dynamics, 16 
Hamiltonian flow, 17, 19 
Hamiltonian mechanics, 13 
existence of dynamics, 17 
initial values, 17 
potential energy, 16 
Harmonic oscillator, 310 
Heat capacity, 73 
Heisenberg operator, 305 
Heisenberg’s equations of motion, 310 
Heisenberg’s uncertainty principle, see 
Uncertainty relation 
Heisenberg, Werner, 129 
Hermite functions, 257 
Hermite polynomials, 257 
Hidden variables, 202, 207, 244-245 
Hilbert space, 227, 233, 237, 239, 248, 253, 
280, 345, 356, 359 
definition, 253 
spectral subspaces, 360 
Huygens’ principle, 24, 25 
Huygens, Christiaan, 16 
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Ideal gas, 69, 71 
| identity matrix or operator, 119 
3 imaginary part, 8 
Incompleteness, 138, 140 
Indeterminism, 136 
Indistinguishable particles, 166 
Initial values, 23 

Hamilton mechanics, 17 

Newtonian mechanics, 13 
Inner product, 229 

definition, 251 

on C({a,b]), 251 

on C”, 251 

on L?, 253 

on (2, 256 

on tensor product spaces, 272 
Interpretation 

of quantum mechanics, 139 

probabilistic, 140 
Invariance 

Galilean, 131 

gauge, 31, 134 

rotation, 131 

time reversal, 131 

translation, 131 
Invariance of a physical law, 43 
Irreversibility, 47, 80-90, 213, 224 
Isomorphism, 256 


Joint probability, 242-244 
Kolmogoroff, Andrei Nikolajewitsch, 96 


L?, 253 

C, 256 

Law of large numbers, 49, 61, 62, 87, 102, 218 
proof of, 105 
strong, 105 
weak, 105 

Lebesgue 
integral, 96-98 
measure, 19, 96, 98 

Lebesgue, Henri, 96 

Legendre transformation, 24 

Leucippus, 50 

Liouville’s theorem, 19, 20, 23, 65, 85, 150, 
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Lippmann-Schwinger equation, 363, 364 
asymptotic form, 371 

Local plane wave, 190 

Locality, 27 

Lohschmid, Josef, 87 
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Lorentz 
boost, 27 
force, 26 
gauge, 31 
group, 27 

LP, 254 


Macroscopically disjoint, 216 
Macrostate, 52, 82-85, 87, 89 
Madelung, Erwin, 137 
Many worlds, 139 
Mass 
Bohmian mass, 145 
Newtonian mass, 12 
Matter wave, 129, 133, 139 
Maxwell distribution, 73 
Maxwell, James Clerk, 52 
Mean value, see Random variable, expectation 
of 
Measurability, 19, 97, 98 
Measure, 19 
microcanonical, 224 
absolutely continuous, 341 
change of, 20 
conditional, 85, 215 
conditional quantum equilibrium, 214 
density of, 21 
equivariant, 211 
quantum equilibrium, 212 
stationary, 20, 22, 65, 88 
stationary ergodic, 92 
time evolution of, 19, 20, 212 
Measurement 
error, 240, 302 
experiment, 174, 175 
formalism, 142 
of an observable, 227, 232 
of an operator, 227 
of position operator, 245 
process, 216 
sequence of, 242 
weak, 242 
Measurement problem, 4, 49, 138, 139, 
173-179 
Metric tensor, 28 
Microscopic dynamics, 82, 114 
Microstate, 52, 53, 77, 82-85 
Minkowski length, 27 
Minkowski space, 27 
Minkowski, Hermann, 27 
Mixing, 94, 95 
Momentum 
canonical, 18, 29 
distribution, 190, 357 
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operator, 130, 238, 306, 307, 309 
Monomials, 257 
Multi-index notation, 260 


VN fluctuation law, 52 
Naive realism, 244—245 
Nature of reality, 37 
Newtonian mechanics, 12, 49 
equations of motion, 12 
existence of dynamics, 13 
gravitation, 12 
initial value problem, 13 
mass, 12 
subsystems, 12 
No-go theorem, 244, 245 
Nonlocal signalling, 208 
Nonlocality, 131, 201, 202, 204, 205, 207, 243, 
244 
Norm, 252 
Number of crossings 
definition of, 349 
expectation value of, 350 
signed, 349 
total, 350 


Observable, 130, 138, 227-245, 345 
measurement of, 239 
self-adjointness of, 230, 231 
time as, 346 

Observer, 3, 178 

Ontology, 9, 37, 38, 43, 45, 98, 138, 173, 379 

Operator 
adjoint, 285 
closed, 286 
closure of, 286 
core, 288 
criterion for essential self-adjointness, 291 
criterion for self-adjointness, 290 
domain of an, 281 
essentially self-adjoint, 288 
extension, 286 
graph of, 286 
Hamilton, 130 
Hermitian, 283 
momentum, 130, 238 
multiplication, 130 
normal, 324 
of asymptotic velocity, 238 
position, 236 
positive, 299 
relative bound, 298 
resolvent of, 314 
Schrédinger, 130 
self-adjoint, 285 
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self-adjoint extension, 293, 295-296 

spectrum of, 314 

symmetric, 283 

unbounded, 281 

unitary equivalence, 343, 360 
Orthodox quantum mechanics, 216 
Orthogonal complement, 269 
Orthogonal decomposition, 252 
Orthogonality, 252 

of wave functions, 229 
Orthonormal basis, 255 

of tensor product spaces, 274 
Orthonormal sequence, 252 


Parmenides, 2 
Parseval equality, 256 
Partition function, 78 
Path integration, 119 
Pauli equation, 160 
Phase space, 15-17, 65, 130 
volume, 19 
|g|?-distribution, 218 
|g? |-distribution, 218 
Plancherel equality, 259 
Planck radiation formula, 123 
Planck’s constant, 11 
Poincaré 
cycles, 114 
recurrence, 88, 89, 91, 180, 198-199 
Poincaré, Henri, 51, 88 
Poisson bracket, 18, 41 
Polarization identity, 256 
Position operator, 197, 236, 245, 304, 305, 
308-310, 312, 313, 322 
Position POVM, 301 
Positive definiteness, 251 
Positive operator valued measure, see POVM 
Positivity of quantum current, 351 
Potential 
advanced, 33 
classical, 132, 150 
Coulomb, 131 
gravitational, 46 
quantum, 149 
retarded, 33 
POVM, 233, 239-241, 244, 300, 303, 345, 346 
definition, 300 
Pressure in kinetic gas theory, 71 
Probability, 49 
measure, 53 
space, 67, 98 
theory, 96 
Projection valued measure, see PVM 
Projector, 301 
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orthogonal, 229, 301 
self-adjointness of, 230 
Projectors, family of orthogonal, 230-235 
Proper time, 28 
|y|?-distribution, 170, 279, 349 
Pure state, 194 
PVM, 233, 235-245, 303, 345, 348 
definition, 301 
Pythagorean theorem, see Parseval equality 


Quantum equilibrium, 151-153, 224, 234, 239 
distribution, 237, 352 
hypothesis, 218-220, 223 
measure, 212 
Quantum flux, 8, 141, 156, 348, 349, 351, 358, 
370 
and scattering cross-section, 371 
meaning for many-particle scattering, 358 
Quantum flux equation, 8, 140, 146, 348, 351 
four-current version, 352 
Quantum potential, 149 
Quantum randomness, 6 


Rademacher function, 58, 59, 83, 99, 100, 104 
Rademacher, Hans, 58 
Radon—Nikodym theorem, 342 
Random variable, 67, 98 

expectation of, 98 

independence, 58, 59, 100, 103 
Randomness, 49 
XK real part, 7 
Relative frequency, 54, 60, 62, 63, 99 
Relativistic dynamics, 28 
Relativistic force, 29 
Relativistic physics, 27 
Renormalization, 34 
Resolvent, 314 
Resolvent set, 314 
Riesz—Fischer theorem, 254 
Riesz—Markov theorem, 320 
Rotation group, 161 


S, 259 
S-matrix, 367 
Scalar product, see inner product 
Scattering, 135, 140 
amplitude, 375 
and exit distribution, 346 
asymptotic completeness, 361 
Born on, 345 
completeness, 361, 362 
derivation of Born’s cross-section, 372 
into cones, 357 
intuitive picture of, 356 
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potential, 356 
regime, 345, 
Rutherford, 377 
states, 356, 359 
stationary treatment, 370, 372 
time as random variable, 345, 347 
time-dependent treatment, 372-378 
Scattering amplitude, 370 
calculation of, 371 
in terms of T-matrix, 372 
Scattering cross-section, 361, 369-378 
Born’s formula, 346, 370-372 
definition of, 371, 375 
empirical, 370 
in terms of quantum flux, 371 
theoretical, 370 
Schmidt basis in tensor spaces, 276 
Schrodinger equation, 130, 134, 149 
classical solution of, 138, 153, 279 
existence and uniqueness of solutions, 247, 
283 
for particle in Coulomb potential, 248, 294 
Green’s function, 364 
Hilbert space theory of solution of, 138, 279 
in distributional sense, 267 
on macroscopic scale, 187 
with boundary conditions, 245-248, 281 
Schrédinger’s cat, 2, 5, 138, 139 
Schrédinger, Erwin, 129 
Schwartz function, 259 
Schwartz space .%, 278 
Schwarz inequality, 252 
Second law of thermodynamics, 80-82, 224 
Self-interaction, 34 
Seminorm on .%, 260 
Separability, 255 
Sesquilinear form, 241 
definition, 251 
o-algebra, 97 
Smoluchowski, Marian von, 49, 54, 55, 59, 
109, 114 
Solvay conference, 127 
Spacetime, 27 
trajectories, 29 
Spectral line, 130, 135 
Spectral measure, 321 
Spectral representation 
of momentum operator, 312 
of position operator, 312 
Spectral theorem, 230, 248, 299 
multiplication operator form, 329 
PVM form for bounded operators, 321 
PVM form for unbounded operators, 329 
Spectrum, 314 
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absolutely and singular continuous, 360 
decomposition, 343 
discrete, 135 
pure point, 360 
Spin, 158-166 
as a property of the wave function, 165 
measurement, 243 
operator, 232 
representation (spin 1/2), 161 
Spin-statistics theorem, 168 
Spinor, 160, 276 
State space, 15 
Stationarity, 212 
Stationary phase argument, 125, 183, 188, 306, 
335, 337, 359, 360, 364-366 
Statistical hypothesis, 62, 63, 65, 213 
Statistical mixture of states, 194 
Stern—Gerlach magnet, 158, 162-166 
splitting of the wave function, 162-164 
Stirling’s formula, 51, 83 
Stochastic mechanics, 351 
Stone’s theorem, 285, 333 
Strong continuity, 280 
Strong limit, 300 
Subsystem, 13, 64, 65, 75, 77, 211, 213, 215, 
216, 218 
Symmetry, 43 
of Galilean spacetime, 45 
Symplectic geometry, 24, 38 


T-matrix, 368, 372 
relation to cross-section, 372 
Tchebychev inequality, 106 
Tensor product, 271 
Thermal death, 81 
Thermodynamic 
free energy, 80 
limit, 79 
Time in quantum mechanics, 346 
Time reversal, 87, 112 
Time reversibility of electromagnetism, 36 
Time-reversal symmetry, 147 
Time-shift invariance, 45 
Trajectory, 135, 137, 141 
integral curve along vector field, 145 
Translation invariance, 45 
Triangle inequality, 252 
Tunneling, 157, 158 
Typical universe, 63, 81, 90 
Typicality, 50-52, 61-65, 81, 89, 96, 151, 212, 
214, 224 
measure of, 64-66, 69, 75, 151, 211, 212, 
220, 222, 224 
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Umkehreinwand, 87, 224 

Uncertainty relation, 131, 135, 137, 142, 151, 
154, 190, 223, 239, 309 

Unitary group, 280, 293, 332 

Unitary operator, 256 

Unitary time evolution, 279 

Universal covering space, 169 


Variable 
hidden, 244—245 
primary, 38 
primitive, 47 
secondary, 38, 47 
Variance, 231 
Vector field, 14, 145, 147 
divergence free, 19 
Velocity vector field, 147 
von Neumann equation, 194 


Wave function 
asymptotic form of, 354 
collapse of, 5, 7, 180-182, 186, 208, 216, 
227, 228 
conditional, 214-218, 223 
effective, 180, 191, 216-221, 223, 224, 227, 
228, 232, 234, 239, 244, 299 
effective collapse of, 179-183, 228 
entangled, 201 
non-measurability of, 179 
of N particles with spin, 276 
of the universe, 212 
on macroscopic scale, 187 
product structure of, 216 
spreading of, 237 
Wave operator, 359 
definition of, 358, 367 
existence of, 360-361 
intertwining property, 360 
inverse of, 358, 360 
Wave packet, 124 
classical behavior of, 183-186 
group velocity of, 183 
spreading of, 124, 125, 163, 183, 185, 335 
Wave packet collapse, see Wave function, 
collapse of 
Weak continuity, 280 
Weak derivative, see distributional derivative 
Weak measurement, 242 
Wiederkehreinwand, 88 
Wiener process, 119 
Worldline, 352 
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